Hi JG, Thanks for your information. I will dig more.
Best, Arber On Thu, Aug 20, 2009 at 12:25 AM, Jonathan Gray <[email protected]> wrote: > Arber, > > I don't have any links to papers handy, unfortunately. Quite honestly > there is a TON of research on this subject. My recommendation is to dig > around ACM, you can find many papers related to duplicate detection. If you > don't have an ACM membership to the archives, digging around Google should > still yield some results. > > Generally an online dupe detection system would take advantage of some kind > of "signature" or dimensional reduction that permits a level of fuzzy > matching. The implementations vary greatly depending on the domain, for > example near-duplicate image detection is a heavily researched field as well > as text-based. > > As I said, this topic is well beyond the scope of this mailing list. A bit > of legwork should yield more papers than you can possibly read :) > > JG > > > Yabo-Arber Xu wrote: > >> Hi JG, >> >> Sorry for interrupting the ongoing topic, but I am quite interested in the >> online dup detection method you mentioned. Could you please elaborate it a >> bit, or point out some links and I will follow? >> >> Best, >> Arber >> >> >> On Wed, Aug 19, 2009 at 1:51 AM, Jonathan Gray <[email protected]> wrote: >> >> You didn't talk much about how you plan on doing dupe-detection of >>> questions, but there are some interesting ways to generate signatures >>> which >>> could turn into your row keys, then you could actually do some kind of >>> online duplicate detecting of already answered questions. That's beyond >>> the >>> scope of this mailing list, however. >>> >>> >>
