Hi JG,

Thanks for your information. I will dig more.

Best,
Arber

On Thu, Aug 20, 2009 at 12:25 AM, Jonathan Gray <[email protected]> wrote:

> Arber,
>
> I don't have any links to papers handy, unfortunately.  Quite honestly
> there is a TON of research on this subject.  My recommendation is to dig
> around ACM, you can find many papers related to duplicate detection.  If you
> don't have an ACM membership to the archives, digging around Google should
> still yield some results.
>
> Generally an online dupe detection system would take advantage of some kind
> of "signature" or dimensional reduction that permits a level of fuzzy
> matching.  The implementations vary greatly depending on the domain, for
> example near-duplicate image detection is a heavily researched field as well
> as text-based.
>
> As I said, this topic is well beyond the scope of this mailing list.  A bit
> of legwork should yield more papers than you can possibly read :)
>
> JG
>
>
> Yabo-Arber Xu wrote:
>
>> Hi JG,
>>
>> Sorry for interrupting the ongoing topic, but I am quite interested in the
>> online dup detection method you mentioned. Could you please elaborate it a
>> bit, or point out some links and I will follow?
>>
>> Best,
>> Arber
>>
>>
>> On Wed, Aug 19, 2009 at 1:51 AM, Jonathan Gray <[email protected]> wrote:
>>
>>  You didn't talk much about how you plan on doing dupe-detection of
>>> questions, but there are some interesting ways to generate signatures
>>> which
>>> could turn into your row keys, then you could actually do some kind of
>>> online duplicate detecting of already answered questions. That's beyond
>>> the
>>> scope of this mailing list, however.
>>>
>>>
>>

Reply via email to