hash on the each page and compare the hash value.

Thanks & regards,
Sathaiah Dontula

On Tue, May 3, 2011 at 8:59 PM, bittu <[email protected]> wrote:

> suppose You have a billion urls, where each is a huge page. How do you
> detect the duplicate documents?
> on what  criteria you will detect it, what algorithm , approach ,
> whats will be the complexity of each approach
> as it has many application in computer science ...i would like to have
> some good discussion on this topic
>
> Lets Explorer All The Approach ???
>
> Thanks & Regrads
> Shashank
> CSE, BIT Mesra
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to