Hi Steve,

I'm sorry that you haven't gotten faster responses.  I think the main issue
is that not many people have worked on data structures in Julia, so there
aren't too many people who feel qualified to respond.  You might get a
faster response if you split up your questions among multiple emails/posts.

Some responses inline below.


As mentioned in earlier posts, I'm working on a 2-3 tree implementation of
> sorted associative arrays (analogous to map and multimap in C++).  This is
> like Dict in Julia, except that the key/value pairs can be retrieved in
> sort-order on the keys.
>
> For my preliminary implementation, the timing for a sequence of  10^7
> insert and find operations is encouraging: my Julia 2-3 trees: 10 sec; my
> Julia 2-3 trees with @inbounds everywhere: 8 sec; my C++ 2-3 trees: 20 sec;
> my C++ 2-3 trees with /O2 compilation: 6 sec; native C++ map (red/black
> trees) with /O2 compilation: 5 sec.
>

That is quite encouraging!


> I have a few additional questions.
>
> (1) Iteration: The Julia manual describes a start/next/done formulation
> for iteration in which the iterator returns (key,value) pairs.   However,
> this is not adequate for some uses of maps.  In C++ an iterator is like a
> pointer to the (key,value) pair or like a subscript of an array.  The
> usefulness of this approach is that the iterator itself can be stored in
> another data structure and then dereferenced much later.  I'm wondering
> whether there is a convention in Julia for naming and operating with this
> kind of iterator.  (It could be related to the 'state' data item described
> in 2.1.7 of the manual.)
>

In Julia, it is common to use a separate immutable data type to store the
state of an iterator, and if set up properly, the iterator can be
dereferenced later.  The Iterators.jl package and base/iterators.jl have a
number of examples of this type of iterator, and that's the closest thing
to a Julia convention. (Did we talk about this already?)

The main things to be careful of using that idiom are 1) even if the state
itself is immutable, the referenced values may be types that allow mutation
of their values, 2) the state might point into a container (e.g., an array
location), and may not be valid if the container contents change, or 3) the
state may refer to an object that itself might change or no longer be part
of the datastructure (e.g., a node in a tree).  The object won't be garbage
collected since there's a reference to it, but it won't be useful either.


> (2) Related to the above question: if m is the map and i is the iterator,
> then a second getindex can be implemented so that m[i] returns the
> key-value pair.  However, it is a bit confusing to overload [] in this way
> because of the resemblance to m[k] where k is a key that returns just the
> value.  Does Julia offer the possibility of overloading some other
> subscript notation besides []? For example, maybe Julia could implement a
> syntax like m[. i .] or something that would call getindex2 and setindex2?
>  it seems to me that there will be other cases when a user has a data
> structure that can be indexed in more than one way.  I suppose this must
> already come up with sparse matrices?
>


At this point, there is no direct way to index an item in more than one
way.  For Julia's Associative types, getindex(key) returns the value
associated with that key.  The default iterator returns (key, value) pairs,
and you can just iterate over keys or values with keys(d) or values(d).
 See the associative data structures section of the Standard Library manual
for more information.

I'm curious how else you want to index?  If you have the key, creating the
(key, value) pair is straightforward.  Trees typically don't allow indexing
by position, though it obviously can be done by storing extra information
in the tree.  Is this what you'd like to do?

One way to do this would be to create a hidden wrapper class, which would
allow this.  For example:

myMap = SortedDict({"aardvark"=>"Super Cool!", "armadillo"=>"How Texan!",
"zebra"=>"African Mambo"})
println(arrayview(myMap)[1])   # prints ("aardvark", "Super Cool!")
av = arrayview(myMap)
println(av[end])               # prints ("zebra", "African Mambo")

Here, arrayview would return an array-like view into the dictionary (which
would be implemented as a separate type.

I don't think this has precedent in Julia, though I think that's mostly
because the data structures haven't become that sophisticated yet.  I might
use this idea for OrderedDicts, though.


>
> (3) @inbounds: Is there a way for a user-written data structure to check
> whether @inbounds is active, and if so, to use a different faster version
> of (user-written) getindex and setindex?
>

I believe the answer is no, since bounds checking is inserted as needed
done during compilation.  I think it would probably be a bad idea for user
code to need this level of reasoning.  If you explicitly check bounds
somewhere, your code itself can use `@inbounds`.

I'm curious here, too, where inbounds is useful in a tree data structure?
 Are you backing the tree with arrays?

 -----

Hope this is useful.

Cheers!
   Kevin

Reply via email to