> I do think there is room for negotiation about the exact naming and > behaviour of the methods in the interface. Returning a Set of > Map.Entrys seems to be useful, but returning a Trie would be tie closer > with the SortedMap interface. I haven't got a strong opinion here but > it would be nice to hear the pros and cons.
This is simple to implement and I cannot think of any disadvantages. It sounds like a good idea to me. > And regarding comments from Pranas - I can't think where /I/ would want > to use a non-string based trie, but I don't see what's gained by > restricting the interface to Strings. > > <thinking-out-loud> > Maybe Gene sequences would be a useful nonstring example... Sure they > can be represented as Strings of A,G,C,Ts but they must be better served > using a 2 bit encoding rather than 16? > </thinking-out-loud> Any information which is indexed by a sequence would be appropriate. Strings, genes and numbers are examples where the sequence is made up of a finite "alphabet". Other examples which are a sequence with an infinite alphabet are URIs and file names. > > I agree that we need to focus on the interface and > > implementation to make sure that they're the best we can come up with. > > +1 I think this is probably the best aim. However it's pretty difficult > to work out the best interface without using it a bit. Maybe we should > commit the current code with a strong "THIS IS ALPHA" warning and let > people start to use the code and submit patches to get the abstraction > right? We already have a precedent in the primitives subpackage of code > that is in CVS but isn't considered ready for release so I don't see > that concerns over back compat should be an issue. Ok, I'll jump in and level a few criticisms at the interface and implementation I submitted. :-) --- Trie - is it the right name? --- I must admit, the more I've looked at this, the more uncomfortable I am with the name. In my reading, I've only ever heard 'trie' applied to a data structure indexed by strings. However, a trie does seem to fit into a more general class of data structures. It looks like these structures are called prefix trees--although I can't find a solid definition. However, it may be a very elementary data structure: http://proteacher.net/archive/posts/2001/09/26/21017.html So conceptually, we have this sort of breakdown: Prefix trees - Prefix trees for handling strings - Trie - The data structure used in suffix trees - Prefix trees for handling binary data - Patricia trees - Prefix trees for handling sequences of anything - AbstractObjectArrayTrie (which should be renamed) --- Separating the interface from the implementation --- Perhaps the interface doesn't need to be named after the implementation at all. There may be other data structures which provide efficient operations on the prefix of keys, but aren't implemented as a prefix tree. --- Suggestion --- Here's a rough suggestion. /** * A Map which provides several operations based on the * prefix of keys. */ public interface PrefixMap extends Map { public PrefixMap subPrefixMap(Object keyPrefix); public Map.Entry getLongest(Object candidateKey); } There is a nice symmetry in providing a SortedPrefixMap, although actual implementation would involve a reasonable amount of work. (But I'll do it if there is consensus about it.) public interface SortedPrefixMap extends PrefixMap, SortedMap { // This would return a SortedPrefixMap. I can't override the // return type, can I? public PrefixMap subPrefixMap(Object keyPrefix); } /** * An implementation of PrefixMap. Subclasses must provide an * implementation for translating keys to and from object arrays * and a mechanism for storing outgoing references. */ public abstract class AbstractObjectArrayPrefixTree implements PrefixMap { ... } public abstract class AbstractObjectArraySortedPrefixTree extends AbstractObjectArrayPrefixTree implements SortedPrefixMap { // extra SortedMap methods // requires subclasses to return items in order with // outgoingTransitionIterator(). ... } /** * Stores outgoing references in a Map, subclasses must * still provide an implementation for translating keys to * and from object arrays. */ public abstract class AbstractOutgoingTransitionPrefixMap extends AbstractObjectArrayPrefixTree { ... } public abstract class AbstractOutgoingTransitionSortedPrefixMap extends AbstractObjectArraySortedPrefixTree { // needs to be passed in comparator ... } /** * A SortedPrefixMap for strings. */ public class Trie implements SortedPrefixMap extends AbstractObjectArraySortedPrefixTree { // Future implementations may not extend this // class. They may be more efficient, and based // on native chars or ints. ... } Any comments on this? Particularly: - The hierarchy of abstract classes. Should it be flattened? At the moment I'm separating out storage of the transitions. This is nice, but perhaps it clutters the package too much? - Stephen has mentioned some sort of TrieUtils? Perhaps this should be called PrefixMapUtils, or go into MapUtils. Rich -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
