> I think, if we add ord as an output to the FST, then it builds > everything we need? Ie no further data structures should be needed? > Maybe I'm confused :)
If you put the ord as an output the common part will be shifted towards the front of the tree. This will work if you want to look up a given value assigned to some string, but will not work if you need to look up the string from its value. The latter case can be solved if you "know" which branch to take while descending from root and the "shared prefix" alone won't give you this information. At least I don't see how it could. I am familiar with the basic prefix hashing procedure suggested by Daciuk (and other authors), but maybe some progress has been made there, I don't know... the one I know is really conceptually simple -- since each arc encodes the number of leaves (or input sequences) in the automaton, you know which path must lead you to your string. For example if you have a node like this and seek for the 12-th term: 0 -- 10 -- ... +- 10 -- ... +- 5 -- .. you look at the first path, it'd give you terms 1..10, then the next one contains terms 11..20 so you add 10 to an internal counter which is added to further computations, descend and repeat the procedure until you find a leaf node. Dawid