On Thu, 2007-08-02 at 16:10 -0400, Kurt Lidl wrote: > Steve Murphy wrote: > > On Thu, 2007-08-02 at 13:09 -0400, Kurt Lidl wrote: > >> Vasil Kolev wrote: > >>> В ср, 2007-08-01 в 16:08 -0600, Steve Murphy написа: > >>> > >>>> We might be able to extend the current algorithm to handle a few more > >>>> simple cases without resorting to such huge machinery. We shall see. > >>>> > >>> Going through this thread, I'm thinking - regular expressions are parsed > >>> to finite state machines, and it shouldn't be hard (for some values of > >>> hard, I have to brush on my discrete maths memories) to build one > >>> automaton to handle the full dialplan for a context and to be able on > >>> each digit to cut the possible places to go. There's some stuff that > >>> can't be done this way for sure (as we all know, all kind of weird shit > >>> is possible with regex), but at least the current syntax can get a bit > >>> extended, and probably a lot of IVRs simplified. > >> The data structure that you're grasping for is the 'trie': > >> http://en.wikipedia.org/wiki/Trie > >> > >> At a previous job, one of my developers wrote a regex -> trie generator, > >> so we could do high-speed lookups of prefix and suffix matches on > >> usernames coming into a radius proxy. Once we finally got it debugged, > >> it was amazingly fast. For that particular appplication we had several > >> hundred prefix and suffix patterns, and generated one giant tree to > >> handle them all. > >> > >> This was pretty slow to generate. So we did the work in a helper > >> program, and wrote the binary trie into a disk file. The main app > >> would stat() the file periodically, and if it was newer than > >> the last one read in, just suck the entire thing into memory, and > >> change the pointer to the root of the trie. I guess we called > >> free on the old one. Seamless, fast and worked extremely well... > >> > >> Generating one trie per context would probably work really well. > >> > >> -Kurt > >> > > > > Kurt-- > > > > Yes, what I did in fast-ast is, according to definition, a trie. > > And, yes, one per context. And, yes, it works really well. > > > > No, it's not a binary tree. But that's not a requirement. > > I wasn't trying to suggest that this was necessary or required. I > was just pointing out for our application of yesteryear, running on > the machines we had at the time, using the input regular expressions > that we needed to support we had to go the route of doing "off-line" > trie calculation. > > As I recall it took on the order of 5 to 10 minutes to generate > the trie for our application. Which was an unacceptable delay > whenever the prefix/suffix table changed (about once a week) > or when the machine(s) running the application were restarted > or the proxy restarted. > > I was putting that out there more so that people knew there was an > easy way to avoid to do extremely compute intensive updates without > compromising the basic operation of the application. Some people just > throw up their hands and say "it's too hard to calculate this at > runtime", whereas you really only need to do the recalculations when > the contents of a context change. (I have no idea how this intersects > with the 'realtime' method of doing things.) > > -Kurt
Kurt-- I only made the comment about the binary tree, because the example in the article showed a binary tree... Your comments are very good. You hit the required data structure dead-center. As far as generation time, With a large 1000+ input set, the trie/tree is fairly quick to build, as the pattern length is usually only 10-16 chars long if that. If you do the trie/tree building at load time, this shouldn't be a problem, and even if you do it at run time (as I think my code does), it hasn't been a problem. For the sake of stuff like Realtime, I feel that "editing" the tree, to remove a pattern (probably by just marking it as dead), or adding a new pattern, should be both possible, and lightening fast. murf -- Steve Murphy Software Developer Digium
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ --Bandwidth and Colocation Provided by http://www.api-digital.com-- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
