Jérôme Pouiller writes: > In another thread, Adeodato Simó wrote: >> I can't see how it'd work here, at least without the help of some >> on-disk structure, since we're talking about a space of 25,000 >> packages. > > Naive search of matching string under a set of 25,000 strings is > something like 2000 times slower (maybe far more) than a correct > algorithm. Adeodato thinks it is not usable. I said we shouldn't reject > idea because of performances and I suggested a better algorithm should > be used.
If we are pulling numbers out of the air, reading data from disk is something like 2000 times slower (maybe far more) than doing string comparisons. I/O may dominate the running time even for a naive comparison. I have some relevant experience that makes me skeptical about needing sophisticated structures to achieve acceptable performance: Ten years ago, I wrote a simple spell checker as part of a job interview. It was able to suggest (ranked) close matches for misspelled words from a ~100,000-word dictionary within a few milliseconds. Given that I had less than an hour's notice about what I would have to write, and only about 4 hours to write and test the program, I would be very surprised if performance for any reasonable algorithm on more modern machines will be bad enough to make it unusable. It is not as if the application code should require significant rewriting based on whether it performs an element-wise comparison or whether it passes some function an array of strings against which to match. The library code is what would need much deeper changes. So why not see how well the existing library works before demanding that it be radically rewritten? If you want to implement a superior algorithm, no one will stop you. If the fuzzy match only runs when no exact match is found, it suffers no slowdown on success and may be fast enough when there is incorrect input. If there are people who think it is unacceptably slow, let them come forward with actual numbers in hand. Insisting on much more development work because of an unsubstantiated concern seems like a poor trade-off. Michael -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org