Dear Techies, Hope you are doing well. I just require some help from you regarding the Python Script. We are dealing with lakhs of corpus Sentences with Microsoft. I have a set of 60 Excel files with More than 796,000 Sentences.
Our work is to find the duplicate sentences from these corpus. I just executed a Python function with Pandas Lib. In that process I can able to find and remove the Total no.of duplicates and combine the overall excel sheet into a combined CSV file. Till this steps It works perfect. I can able to calculate the Total no.of Duplicates in Overall 60 files.I need some additional functionalities in my script. ** I just want to find the no.of duplicates in each file. * ** If I have already checked the folder with 10 files, It shows some data of duplicates. Again If I add some more files in that same folder , We need to match the records .* Kindly help me . Thank You , Regards _ LOKESHKUMAR RAVI (Machine Learning Engineer) Langscape Language Solutions pvt Ltd. Techno Blogger: https://lokesh7797anon.blogspot.in
Dedupes.py
Description: Binary data
_______________________________________________ Chennaipy mailing list Chennaipy@python.org https://mail.python.org/mailman/listinfo/chennaipy