Re: [Haskell-cafe] Optimizing a title matcher

Lyle Kopnicky Tue, 26 Sep 2006 19:22:29 -0700

Robert Dockins wrote:

Humm... well, double nested loops seems like the wrong approach.

It may be. I had hoped it would be fast enough that way.

Also, if youare using GHC, it's hashtable implementation has farily well-knownperformance problems. If all you care about is exact matching, then theoperation is essentially a finite map intersection (if I've understood whatyou are trying to do).

No, I don't just care about exact matching. I care about very fuzzymatching. It's just that the code I've implemented so far only doesexact matching on the title strings. It's a first step.

This is just a guess, but I suspect you will probably get much betterperformance (and better-looking code!) by just using Data.Map.intersectionhttp://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html#v%3Aintersection

Thanks, but that won't help with the fuzzy matching.

Alternately, there is the ternary trie implementation from Edison(http://www.eecs.tufts.edu/~rdocki01) that may also work for you.

That may be useful.

If you need to do prefix matching, then a trie is the way to go. You canprobably code up a nice prefix-intersection operation using tries that shouldgo pretty fast.
If you have some other metric other than prefix in mind for partial matches,then things probably get a lot more complicated. You're probably looking atcalculating minimum distances in some feature-space, which calls for prettysophisticated algorithms if you need good performance.

Yes, that's the kind of thing I'm looking at doing. Looking at editdistance.


Thanks,
Lyle
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Optimizing a title matcher

Reply via email to