If you expect that the vectors will *not* be in the Set most of the time,
you should try using a bloom filter.  There is a BloomFilters.jl package
that should get you started.  If that's not fast enough, it might also be
useful to tailor the bloom filter implementation to your data.  Note that
you'll still need a Set of some sort.

Cheers,
   Kevin


On Sun, Mar 2, 2014 at 1:08 AM, Ivar Nesje <[email protected]> wrote:

> There is never any one "most performant way in julia". There is often a
> pretty/convenient way, that fits well with the language idioms, standard
> library and can be implemented in a oneliner, and while a lot of effort has
> gone into making that solution performant, it is seldom the best possible.
>
> The most performant way depends on the number and size of the vectors, and
> how different they are. If there is a high probability for the first
> element to be unique, you might store a vector of vectors in a dictionary
> keyed by the first element. There are also lots of other tricks that
> exploits the size and shape of data to get performance. Hashing and random
> lookups are often slow, and even though your algorithm book says access is
> O(1), it is slower than O(n^2) algorithms for small n.
>
> The standard Hash function for abstractarray is defined in dict.jl
>
> function hash(a::AbstractArray)
>     h::Uint = hash(size(a))+1
>     for i=1:length(a)
>         h = bitmix(h,int(hash(a[i])))
>     end
>     return h
> end
>
> You can actually just use arrays as a key in Dict or Set as they are, but
> there are some caveats, that you need to take into account. The dict does
> not take a copy of the key, and if you mutate the key, you will lose lookup
> access to the value stored. I (and the designers of Python), think this is
> a problem that we should not give programmers the possibility to have, but
> Stefan disagrees and have tried to make hashing just work. You will then
> have to deal with the consequences if you do not understand their thinking.
>
> Here is an example on how hashing mutable collections work.
>
> julia> a = [1:10];
> julia> b = Dict{AbstractArray, Int}();
> julia> b[a] = 5
> 5
> julia> b[1:10]
> 5
> julia> b[a]
> 5
> julia> a[5] = 9
> 9
> julia> b[a]
> ERROR: key not found: [1,2,3,4,9,6,7,8,9,10]
>  in getindex at dict.jl:582
>
> julia> b[1:10]
> ERROR: key not found: 1:10
>  in getindex at dict.jl:582
>
> julia> b
> [[1,2,3,4,9,6,7,8,9,10]=>5]
>
> If you change the line where you use a as a key to
> julia> b[copy(a)] = 5
> you get more expected results.
>
> Ivar
>
> kl. 09:02:52 UTC+1 søndag 2. mars 2014 skrev Jonathan Malmaud følgende:
>
>> What's the most performant way in Julia to maintain a set of vectors and
>> check if a given vector is in the set? I was thinking of converting the
>> vectors to tuples and then using "in(::Set, ...)" but I'm worried about the
>> performance penalty of continuously converting vectors to tuples.
>>
>

Reply via email to