Ken Krugler wrote:

common case. Thus it could be somewhat computationally expensive (e.g. a winnowing ala http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf).

Interesting paper, thanks for the pointer - I always wondered what criteria to use to reduce the number of shingles, and this winnowing is a simple enough recipe for creating page signatures. I may be tempted to implement it ;)

I took a quick scan through the public code and didn't find anything that looked appropriate for this. One more potentially useful paper is here:

http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

This URL looks similar to the one you mentioned before ... probably a case of near-duplicate *chuckle* ...

Sorry about that - I can't really claim I was checking your manual dedup support. The real URL is:

http://www1.cs.columbia.edu/~cs6998/final_reports/ca2269-report.pdf

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Reply via email to