Dear Techies,

Hope you are doing well. I just require some help from you regarding the
Python Script. We are dealing with lakhs of corpus Sentences with
Microsoft. I have a set of 60 Excel files with More than 796,000 Sentences.

Our work is to find the duplicate sentences from these corpus. I just
executed a Python function with Pandas Lib. In that process I can able to
find and remove the Total no.of duplicates and combine the overall excel
sheet into a combined CSV file.

Till this steps It works perfect. I can able to calculate the Total no.of
Duplicates in Overall 60 files.I need some additional functionalities in my
script.

** I just want to find the no.of duplicates in each file. *
** If I have already checked the folder with 10 files, It shows some data
of duplicates. Again If I add some more files in that same folder , We need
to match the records .*

Kindly help me .

Thank You ,
Regards _  LOKESHKUMAR RAVI (Machine Learning Engineer)
Langscape Language Solutions pvt Ltd.
Techno Blogger:  https://lokesh7797anon.blogspot.in

Attachment: Dedupes.py
Description: Binary data

_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

Reply via email to