* Roberto Rosario: " Re: [Mayan EDMS: 836] Automatic upload from certain
  staging folder" (Fri, 5 Sep 2014 23:37:31 -0700 (PDT)):

> 
> 
> On Wednesday, September 3, 2014 6:48:15 PM UTC-4, Mathias Behrle wrote:
> >
> > * Roberto Rosario: " Re: [Mayan EDMS: 816] Automatic upload from certain 
> >   staging folder" (Wed, 3 Sep 2014 11:47:52 -0700 (PDT)): 
> >
> > > I like the barcode/qrcode idea very much, would allow for batch 
> > scanning, 
> > > for example several documents placed in a scanner with a document feeder 
> > > and each document has a printed page with a barcode defining the 
> > metadata 
> > > kind of like FAX cover pages. Regional OCR is a must have feature and 
> > > usually a defining feature of the commercial offerings, I don't know how 
> > > accurate OCR a rectangle of text would but is there is a need for the 
> > > feature let's do it. We need a way to let users mark/highlight the 
> > fields 
> > > they want scanned and entered as metadata. This would required some 
> > design 
> > > decisions (do we store the cursor's x and y positions of the square to 
> > be 
> > > scanned or the x and y % in relation to the current zoom level) and a 
> > rich 
> > > client w/ corresponding API endpoints to talk to the backend. 
> > > 
> > > On Wednesday, August 27, 2014 5:22:32 PM UTC-4, Mathias Behrle wrote: 
> > > > 
> > > > * Roberto Rosario: " Re: [Mayan EDMS: 761] Automatic upload from 
> > certain 
> > > >   staging folder" (Wed, 30 Jul 2014 13:36:51 -0400): 
> > > > 
> > > > > This feature was actually started some time ago ( 
> > > > > 
> > > > 
> > https://github.com/mayan-edms/mayan-edms/blob/master/mayan/apps/sources/models.py#L194)
> >  
> >
> > > > 
> > > > > but is not yet enabled because it depends on some scheduling update 
> > that 
> > > > > have not made it into the master branch. 
> > > > > 
> > > > > As for metadata, I came up with some ideas but none are implemented. 
> > One 
> > > > > was to let users set default metadata values as well as document 
> > type 
> > > > for 
> > > > > each watch folder. Another idea was when a document is being 
> > imported 
> > > > from 
> > > > > a watch folder to look for a file with the same name but with the 
> > > > .metadata 
> > > > > extension. No design decision has been reached yet so any ideas are 
> > > > > welcomed. 
> > > > 
> > > > Both possibilities could have their individual use cases, for which 
> > they 
> > > > fit 
> > > > best. The most flexible approach is the second. 
> > > > 
> > > > What I found when evaluating other DMS software: 
> > > > 
> > > > - Inclusion of some identifier on the document (could be a barcode, or 
> > > > some 
> > > >   special formatted string, or...). This identifier must not 
> > necessarily 
> > > > be 
> > > >   fixed on the document, but could be the first page of a scan or some 
> > > > paper 
> > > >   scanned together with the document. This method applies preferably 
> > to 
> > > > scanned 
> > > >   documents. 
> > > > 
> > > > 
> > > I like the barcode/qrcode idea very much, would allow for batch 
> > scanning, 
> > > for example several documents placed in a scanner with a document feeder 
> > > and each document has a printed page with a barcode defining the 
> > metadata 
> > > kind of like FAX cover pages. 
> >
> > Yes, the comparison with the Fax cover page hits the mark. 
> >
> > Question: 
> > When batch scanning, how to determine the beginning and the end of a 
> > batch? 
> 
> Will each document require a 'cover page' or can such a cover page be valid 
> > for 
> > several documents? Perhaps the number of documents could be included on 
> > the 
> > cover page, but this would always require a new cover page per batch. 
> >   
> >
> 
> I don't we would need to specify the page count. We can come up with some 
> base codes that are encoded into a qrcode and printed as a cover page. When 
> Mayan detects the cover page all documents or pages detected afterwards 
> inherit whatever metadadata, document type or any setting specified in the 
> cover page. If another cover page is detected Mayan know this is the 
> beginning of a new document or documents. Example:
> 
> * A 'set metadata' cover page with some values encoded: vendor="vendor 1"
> * A 'set metadata' cover page with some values encoded: vendor="vendor 2"
> * A 'new document' cover page
> 
> The physical document paper sandwich would be:
> 
> - Set metadata cover, vendor 1
> - New document cover page
> - Document 1 page 1
> - Document 1 page 2
> - New document cover page
> - Document 2 page 1
> - Document 2 page 2
> - Set metadata cover, vendor 2
> - New document cover page
> - Document 3 page 1
> - Document 3 page 2
> 
> All of this is scanned in one go using a paper feeder and we just scanned 
> and pushed into Mayan 3 multipage documents with 2 of them using the same 
> metadata and one with a different metadata. We can create more 'control 
> message' for new cover page types as we need along the road and can cover 
> several user scenarios. We can create an 'End document' cover page if 
> needed. The cover page is just a blank page with a QR code. Control cover 
> pages can be physically reused or photocopied only cover pages with dynamic 
> user data like the set metadata cover page would need to be printer more 
> than once if the metadata changes, but if the metadata is periodic, like 
> say vendor names they can also be reused.

Sounds good for me! It covers all scenarios I can imagine, including the one to
reset the metadata with a metadata cover only containing empty values.

> > > > - Rather straightforward is a sort of recognition, where templates can 
> > be 
> > > >   defined containing regions formatted in an individual way. E.g. if 
> > you 
> > > > have a 
> > > >   supplier with his custom invoice format displaying the invoice 
> > number, 
> > > > date, 
> > > >   amount at fixed places, they could be used on such a template and 
> > the 
> > > > software 
> > > >   can check, if the document contains such a region. 
> > > > 
> > > 
> > > 
> > > Regional OCR is a must have feature and usually a defining feature of 
> > the 
> > > commercial offerings, I don't know how accurate OCRing a rectangle of 
> > text 
> > > would be but if there is a need for the feature let's do it. I see some 
> > > requirements, we need a way to let users mark/highlight the fields they 
> > > want scanned and entered as metadata. This would require some design 
> > > decisions (do we store the cursor's x and y positions of the square to 
> > be 
> > > scanned or the x and y % in relation to the current zoom level) 
> >
> > The more agnostic of the zoom level, the better. So I would think x and y 
> > in 
> > relation to X and Y (where X and Y are the dimensions of the whole page). 
> >
>  
> 
> > > and a rich client w/ corresponding API endpoints to talk to the backend. 
> >
> > Do you mean, a separate client is needed for that purpose? 
> >
> 
> I meant that some interactive javascript/jquery code will be needed on the 
> template, sorry about the confusing wording :)

Thanks, now clear for me.



-- 

    Mathias Behrle
    PGP/GnuPG key availabable from any keyserver, ID: 0x8405BBF6

Attachment: signature.asc
Description: PGP signature

Reply via email to