Those are very good ideas! 

There was once a duplicate search feature but was removed due to lack of 
usage and because it ran on the foreground and could take a long time since 
the checksum of each document was check against the checksum of each other 
document , the time was exponential to the number of documents. If checking 
from duplicates using the first as the first step, the second step would be 
to search those documents using the API by checksum. The checksum field is 
not exposed so that is another update to the API that would need to be done.

Skipping duplicates from the watch folder would be less difficult since 
this is just a single query to see it the checksum is already matched in 
the database.

I'm updating the roadmap wiki 
(https://gitlab.com/mayan-edms/mayan-edms/wikis/roadmap/) and will add 
these.

Thank you!

On Monday, January 30, 2017 at 8:22:49 PM UTC-4, Victor Zele wrote:
>
> We have several watch folders setup for contracts, invoices, quotes, etc.
>
> It would be nice if Mayan would validate a new document does not exist 
> already in the system by checking maybe an MD5 checksum table of current 
> documents in the system and reject the new document as already existing.
>
> Also, for duplicates, it would be nice to run a cleaner on the 
> /opt/mayan-edms/lib/python2.7/site-packages/mayan/media/document_storage 
> directory of PDFs to clean out duplicates.  I can write a shell script to 
> check for PDF duplicates via MD5 sums, but no way to automate cleaning them 
> out of the Mayan system/DB.
>
> Just an idea,
> Victor
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to