In case someone here sees some use of this for our URI handling... -------- Original Message -------- Subject: Re: [Collections] [SUBMIT] Trie Date: Thu, 5 Dec 2002 14:31:28 +1300 (NZDT) From: Rich Dougherty <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>
I've written an implementation for a trie which I'd like to contribute. I've attached a simple implementation and a test case.I've taken a quick look at the code here, and that all looks fine. Unfortunately, I don't really know what I'm looking at! What I mean is that I've never heard of a 'Trie' before, and am thus wondering as to why I would use it, and is it general enough to be in [collections]. In the past we have had discussions about Tree implementations (and I have coded some before). This may be related, as the Trie appears to be a recursive structure. Perhaps, some use cases as to why a Trie is useful, especially in a general/server environment would be useful. Thanks.
That sounds reasonable to me. :-) I wrote this trie to solve a specific problem for me. In a web application I'm writing I have a set or URIs that map to a handler for a request. I have mappings like: / -> home page /article -> article list page /article/view -> view article page /article/edit -> edit article page /auth/login -> login page /auth/logout -> logout page I wanted a data structure to store these mappings. I have written a class called SimplifiedURITrie which extends AbstractTransitionMapTrie to store mappings of these type. I submitted StringTrie as a concrete example of the Trie interface. I haven't submitted SimplifiedURITrie since I felt it was very specific to my application, but if you're interseted I've attached it. If you want to include it in Commons (which I doubt), then I'm happy to tidy it up a bit. Back to my application. A trie that held these mappings would look like this: + -> home page | +-article-+ -> article list page | | | +-view-+ -> view article page | | | +-edit-+ -> edit article page | +-auth-+ | +-login-+ -> login page | +-logout-+ -> logout page Notice how the key (the URI) is broken into parts with one part being stored on each transition. All keys that share a common prefix hang off a particular node. See the diagram at http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie.html for an example with strings. For my application a trie was useful for a couple of reasons: - Space efficiency. Each part of the key is only stored once. I would only need to store the transition "article" once even if I had hundreds of URLs starting with "/article". The actual space efficiency of a trie depends on how efficiently the transitions are stored, and the nature of the data stored in it. A dictionary of strings could be stored in a trie quite efficiently. However, I wouldn't suggest the current StringTrie implementation be used for this reason. It uses a HashMap for storing the transitions from each node, which means it uses more space than is strictly necessary. - Can efficiently find the entries whose keys are the prefix of a given value. The method getLongest() returns the longest of these keys. This is useful for me in my URI matching problem. If someone gives me the URI "/article/view/43" I can search the trie and find that the best match is "/article/view". (I'll pass the "43" along to my handler so it can show the appropriate article.) I also added the method prefixEntrySet() to the interface because it was efficient to perform and seemed like it might be useful. Prompted by your question I did a bit of searching and found that it would be handy for substring matching using a "suffix trie": http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/ Cool huh. I'm not really a trie zealot, but I think a trie is a reasonably useful data structure. I couldn't find an implementation with a suitable license anywhere, so I wrote my own. I found an implementation in a servlet container written by someone at HP (open source, with heavy restrictions) and one in LimeWire (GPL, not LGPL or Apache). Also many implementations were too limited for me. Most operated only on Strings or byte arrays, I wanted something more general. I hope that helps. Rich -- Nicola Ken Barozzi [EMAIL PROTECTED] - verba volant, scripta manent - (discussions get forgotten, just code remains) ---------------------------------------------------------------------
SimplifiedURITrie.java
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]