* Roberto Rosario: " Re: [Mayan EDMS: 836] Automatic upload from certain staging folder" (Fri, 5 Sep 2014 23:37:31 -0700 (PDT)):
> > > On Wednesday, September 3, 2014 6:48:15 PM UTC-4, Mathias Behrle wrote: > > > > * Roberto Rosario: " Re: [Mayan EDMS: 816] Automatic upload from certain > > staging folder" (Wed, 3 Sep 2014 11:47:52 -0700 (PDT)): > > > > > I like the barcode/qrcode idea very much, would allow for batch > > scanning, > > > for example several documents placed in a scanner with a document feeder > > > and each document has a printed page with a barcode defining the > > metadata > > > kind of like FAX cover pages. Regional OCR is a must have feature and > > > usually a defining feature of the commercial offerings, I don't know how > > > accurate OCR a rectangle of text would but is there is a need for the > > > feature let's do it. We need a way to let users mark/highlight the > > fields > > > they want scanned and entered as metadata. This would required some > > design > > > decisions (do we store the cursor's x and y positions of the square to > > be > > > scanned or the x and y % in relation to the current zoom level) and a > > rich > > > client w/ corresponding API endpoints to talk to the backend. > > > > > > On Wednesday, August 27, 2014 5:22:32 PM UTC-4, Mathias Behrle wrote: > > > > > > > > * Roberto Rosario: " Re: [Mayan EDMS: 761] Automatic upload from > > certain > > > > staging folder" (Wed, 30 Jul 2014 13:36:51 -0400): > > > > > > > > > This feature was actually started some time ago ( > > > > > > > > > > > https://github.com/mayan-edms/mayan-edms/blob/master/mayan/apps/sources/models.py#L194) > > > > > > > > > > > > > but is not yet enabled because it depends on some scheduling update > > that > > > > > have not made it into the master branch. > > > > > > > > > > As for metadata, I came up with some ideas but none are implemented. > > One > > > > > was to let users set default metadata values as well as document > > type > > > > for > > > > > each watch folder. Another idea was when a document is being > > imported > > > > from > > > > > a watch folder to look for a file with the same name but with the > > > > .metadata > > > > > extension. No design decision has been reached yet so any ideas are > > > > > welcomed. > > > > > > > > Both possibilities could have their individual use cases, for which > > they > > > > fit > > > > best. The most flexible approach is the second. > > > > > > > > What I found when evaluating other DMS software: > > > > > > > > - Inclusion of some identifier on the document (could be a barcode, or > > > > some > > > > special formatted string, or...). This identifier must not > > necessarily > > > > be > > > > fixed on the document, but could be the first page of a scan or some > > > > paper > > > > scanned together with the document. This method applies preferably > > to > > > > scanned > > > > documents. > > > > > > > > > > > I like the barcode/qrcode idea very much, would allow for batch > > scanning, > > > for example several documents placed in a scanner with a document feeder > > > and each document has a printed page with a barcode defining the > > metadata > > > kind of like FAX cover pages. > > > > Yes, the comparison with the Fax cover page hits the mark. > > > > Question: > > When batch scanning, how to determine the beginning and the end of a > > batch? > > Will each document require a 'cover page' or can such a cover page be valid > > for > > several documents? Perhaps the number of documents could be included on > > the > > cover page, but this would always require a new cover page per batch. > > > > > > I don't we would need to specify the page count. We can come up with some > base codes that are encoded into a qrcode and printed as a cover page. When > Mayan detects the cover page all documents or pages detected afterwards > inherit whatever metadadata, document type or any setting specified in the > cover page. If another cover page is detected Mayan know this is the > beginning of a new document or documents. Example: > > * A 'set metadata' cover page with some values encoded: vendor="vendor 1" > * A 'set metadata' cover page with some values encoded: vendor="vendor 2" > * A 'new document' cover page > > The physical document paper sandwich would be: > > - Set metadata cover, vendor 1 > - New document cover page > - Document 1 page 1 > - Document 1 page 2 > - New document cover page > - Document 2 page 1 > - Document 2 page 2 > - Set metadata cover, vendor 2 > - New document cover page > - Document 3 page 1 > - Document 3 page 2 > > All of this is scanned in one go using a paper feeder and we just scanned > and pushed into Mayan 3 multipage documents with 2 of them using the same > metadata and one with a different metadata. We can create more 'control > message' for new cover page types as we need along the road and can cover > several user scenarios. We can create an 'End document' cover page if > needed. The cover page is just a blank page with a QR code. Control cover > pages can be physically reused or photocopied only cover pages with dynamic > user data like the set metadata cover page would need to be printer more > than once if the metadata changes, but if the metadata is periodic, like > say vendor names they can also be reused. Sounds good for me! It covers all scenarios I can imagine, including the one to reset the metadata with a metadata cover only containing empty values. > > > > - Rather straightforward is a sort of recognition, where templates can > > be > > > > defined containing regions formatted in an individual way. E.g. if > > you > > > > have a > > > > supplier with his custom invoice format displaying the invoice > > number, > > > > date, > > > > amount at fixed places, they could be used on such a template and > > the > > > > software > > > > can check, if the document contains such a region. > > > > > > > > > > > > > Regional OCR is a must have feature and usually a defining feature of > > the > > > commercial offerings, I don't know how accurate OCRing a rectangle of > > text > > > would be but if there is a need for the feature let's do it. I see some > > > requirements, we need a way to let users mark/highlight the fields they > > > want scanned and entered as metadata. This would require some design > > > decisions (do we store the cursor's x and y positions of the square to > > be > > > scanned or the x and y % in relation to the current zoom level) > > > > The more agnostic of the zoom level, the better. So I would think x and y > > in > > relation to X and Y (where X and Y are the dimensions of the whole page). > > > > > > > and a rich client w/ corresponding API endpoints to talk to the backend. > > > > Do you mean, a separate client is needed for that purpose? > > > > I meant that some interactive javascript/jquery code will be needed on the > template, sorry about the confusing wording :) Thanks, now clear for me. -- Mathias Behrle PGP/GnuPG key availabable from any keyserver, ID: 0x8405BBF6
signature.asc
Description: PGP signature
