braver wrote:
Wren -- thanks for the clarification!  Someone said that Foldable on
Trie may not be very efficient -- is that true?

That was probably me saying that I had worked on some more efficient implementations than those currently in use. Alas, the more efficient ones seem to alter the order of enumerating the values, and I haven't had a chance to fix that (so they're not being used currently).

As for general efficiency, I haven't run any benchmarks for folds vs other datastructures so I can only speculate. If you're using foldl or foldr instead of foldrWithKey[1] then the performance should be comparable to Data.Map. For both data structures, all they do is traverse the spine and enumerate the leaves, so the indexing data on the spine nodes is irrelevant.


What's the exact relationship between Trie and Map and their
respective performance?

Data.Trie is a big-endian patricia tree with node fusion, which is the same datastructure as Data.IntMap. The only difference is that IntMap can only take fixed-length keys (namely Ints) whereas Trie can take variable length keys (i.e., any string of bits). On the other hand, Data.Map is a form of balanced tree, which is rather different. The API documentation gives links to the relevant papers.

The API documentation also gives their asymptotic performance. I don't have all the numbers for Trie yet because my sources didn't give them and I haven't had the chance to prove them myself, but they will be similar to those for IntMap. In general, because it is specialized for the key type, Trie will be more efficient than (Map Bytestring) just as IntMap is more efficient than (Map Int).

The exact performance characteristics can vary depending on the specific program, however. Because patricia trees and balanced trees are such different structures, they excel for different usage patterns. For example, if you wanted to get (or enumerate) the submap containing only the keys beginning with "foo", tries will be much more efficient than the alternatives. However, if you need to use functions like foldrWithKey often, then tries will be at a disadvantage since they must reconstruct the keys.[1] As I recall, tries are also at an advantage if you often merge/meld maps; though I don't recall the algorithm Data.Map uses to be sure of that.

Similarly, since Map, IntMap, and Trie are persistent data structures, if you're not using that persistence then you may get better performance out of an ephemeral data structure like hash tables. Though you should first check to make sure that IntMap + interning is insufficient, since a big part of hash table's performance characteristics comes from the interning implicit in the hash function.

Choosing the right tool depends on how you want to use it.



[1] If you do need to use foldrWithKey often, an easy way to improve performance is to use a Trie (ByteString,Foo) and store the complete key alongside the value. This way you can use the faster foldl/foldr and still have access to the key. This will increase memory consumption, of course, though Trie does try to maximize sharing of key segments, so it won't cost as much as initially expected.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to