There is never any one "most performant way in julia". There is often a
pretty/convenient way, that fits well with the language idioms, standard
library and can be implemented in a oneliner, and while a lot of effort has
gone into making that solution performant, it is seldom the best possible.
The most performant way depends on the number and size of the vectors, and
how different they are. If there is a high probability for the first
element to be unique, you might store a vector of vectors in a dictionary
keyed by the first element. There are also lots of other tricks that
exploits the size and shape of data to get performance. Hashing and random
lookups are often slow, and even though your algorithm book says access is
O(1), it is slower than O(n^2) algorithms for small n.
The standard Hash function for abstractarray is defined in dict.jl
function hash(a::AbstractArray)
h::Uint = hash(size(a))+1
for i=1:length(a)
h = bitmix(h,int(hash(a[i])))
end
return h
end
You can actually just use arrays as a key in Dict or Set as they are, but
there are some caveats, that you need to take into account. The dict does
not take a copy of the key, and if you mutate the key, you will lose lookup
access to the value stored. I (and the designers of Python), think this is
a problem that we should not give programmers the possibility to have, but
Stefan disagrees and have tried to make hashing just work. You will then
have to deal with the consequences if you do not understand their thinking.
Here is an example on how hashing mutable collections work.
julia> a = [1:10];
julia> b = Dict{AbstractArray, Int}();
julia> b[a] = 5
5
julia> b[1:10]
5
julia> b[a]
5
julia> a[5] = 9
9
julia> b[a]
ERROR: key not found: [1,2,3,4,9,6,7,8,9,10]
in getindex at dict.jl:582
julia> b[1:10]
ERROR: key not found: 1:10
in getindex at dict.jl:582
julia> b
[[1,2,3,4,9,6,7,8,9,10]=>5]
If you change the line where you use a as a key to
julia> b[copy(a)] = 5
you get more expected results.
Ivar
kl. 09:02:52 UTC+1 søndag 2. mars 2014 skrev Jonathan Malmaud følgende:
>
> What's the most performant way in Julia to maintain a set of vectors and
> check if a given vector is in the set? I was thinking of converting the
> vectors to tuples and then using "in(::Set, ...)" but I'm worried about the
> performance penalty of continuously converting vectors to tuples.
>