Hi,

You may look out for plagirism detectors.

My approach would be :
1. Hash all the keywords in one file and keep the count.
2. For each keyword in the other file , check if it exists in the hash table
, decrement its count. Also increment a counter which represents the
similarity between the two docs.

For percentage you might also count the total keywords in the second doc and
do "found keywords"/ total keywords.

On Wed, Jul 6, 2011 at 11:41 AM, Navneet Gupta <navneetn...@gmail.com>wrote:

> See diff documentation. It's an application of Longest Common
> Subsequence problem.
> http://en.wikipedia.org/wiki/Diff
>
> On Wed, Jul 6, 2011 at 11:12 AM, priyanshu <priyanshuro...@gmail.com>
> wrote:
> > What is the most efficient way to compare two text documents?? Also we
> > need to find the percentage by which they match..
> >
> > Thanks,
> > priyanshu
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> > To post to this group, send email to algogeeks@googlegroups.com.
> > To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
> >
> >
>
>
>
> --
> Navneet
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>


-- 
regards,
chinna.

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to