[Mayan EDMS: 1356] Re: So promissing !!!

Bruno CAPELETO Sat, 14 May 2016 13:57:07 -0700


Le mercredi 27 avril 2016 19:47:18 UTC+2, Roberto Rosario a écrit :
>
>
>
> On Saturday, April 9, 2016 at 6:57:33 PM UTC-4, Bruno CAPELETO wrote:
>>
>> Dear Roberto,
>>
>> Third night I am spending on Maya EDMS and it is so promissing ! 
>> Congratulation to you !
>>
>
> Thank you! We have worked very hard.
>  
>
>>
>> In order to test it I am trying to implement the workflows of my own 
>> company, and there are several comments I would like to address (is it the 
>> right place here ?) :
>>
>
> Yes, it's better to discuss things in the mailing list and once a bug or 
> solution is more concrete then add it as an issue in the code ticket 
> system. 
>
>
>> - the documentation is too short and too advanced : I have a great 
>> experience in Linux but not in python, and it was difficult for me to set 
>> it up. To enforce the awareness of your software, we should make it 
>> accessible to others than specialists in Linux, python and programming. 
>> Especially it must be available to entrepreneurs who have concrete examples 
>> to manage. By the way, I would be glad to contribute to the French 
>> translation (how can I do ?)
>>
>>
> Yes, the documentation is always a pain point. English is not my first 
> language and while I'm bilingual you can tell is it not written by a native 
> speaker. Contributions for the documentation are urgently needed. Agree on 
> making it more accessible that is why there is an new chapter on deploying 
> and Docker/Docker Compose sections with images handled by a separate 
> repository. I added an entry in the roadmap mentioning the need to make 
> Debian packages, this way it could be installed with a Debian/Ubuntu's 
> apt-get command. There is also the commercial hosted cloud version being 
> developed (
> https://mayan-edms.us13.list-manage.com/subscribe/post?u=0b6b071ee8fc1197c9e8cd6fe&id=1fe0489557).
>  
> Any other ideas to lower the entry barrier are welcome. The translation can 
> be managed locally using the Django Rosetta package or by using Transifex  (
> https://mayan.readthedocs.org/en/latest/topics/translations.html) which I 
> just saw you doing and completing the French translation, thanks!
>


I've just registered in Transifex, I can ask to contribute to any language 
excepted French !
Probably I did something wrong but I do not find what...
 

>
>  
>
>> - the "Actions" menu must become more convenient : for example to add a 
>> metadata after scanning, one has to view the document, go to the "metadata" 
>> tab, then "Actions" and at last add metadata. Even then, it is cumbersome 
>> if one has several metadata to fill in.
>> I suggest at least the "Actions" menu to become an option submenu when 
>> passing over with the mouse.
>>
>
> In the past all action were displayed in the view and the common complaint 
> was that it "confused the eyes" to have so many options at once, hence the 
> drop down was added. Perhaps keeping the dropdown but letting show all 
> options without having to switch the document view context. Example: Show 
> "Add metadata" even when you are in the document tag list view.
>

Assigning and/or confirming metadata to documents is (I guess) key to the 
success of the software. So that kind of functionality should have the 
priority over all (in the development and in the GUI). Ideally - but I have 
no idea whether this is doable with Django - a dedicated keyboard shortcut 
would be the best option. At least, an quick way via the GUI.

Still concerning metadata :

* I realize that one can assign metadata to multiple documents. What a pity 
the documents must be the same type ! Let's take an example : I have a 
paper customer folder with a bill (type 1) and a confirmation of paiment 
(type 2). Both types have - among others - the same metada "customer_code" 
and "bill_no". The software should be able to make it possible to assign 
the "shared" metadata together (the dropdown list would exclude all 
metadata that are specific to one type, but still show propose the one that 
are in common).

* When assigning metadata, the software asks the user to choose which one, 
even if there is only one possibility (only one metadata associated to the 
type). In that case, this step should be skipped in order to directly 
assign the value of the metadata.

 
>
>>
>> - Still after scanning, there is a great function to fill the missing 
>> metadata ; again, this option is hidden. I suggest that, as soon as a 
>> document enters the system via a watch folder and is missing metadata, 
>> there is blinking icon "set metadata" inviting the user to set the 
>> metadata. The content of the metadata may even be suggested. For example a 
>> document type "Bill" can have a "Supplier" metadata. For each possible 
>> value of "Supplier" like "EDF" (a French electricity company) an associated 
>> list of keywords to detect via OCR would help to suggest the best value for 
>> a list metadata.
>> It is essential to easily assign metadata to incoming documents so that 
>> they can be classified. We still cannot rely completely on OCR :-(
>>
>
> I would try to make this as obstructive as possible, how about a blinking 
> exclamation sign instead?  As for predictive value you said it yourself, 
> OCR is not to be relied upon. Some suggestions I've gotten in the past is 
> splitting the OCR into two tabs: Extracted text content and recognized text 
> content. One would show the text as removed from document that have the 
> text embedded (text, RTF, doc, PDF) and the other from image documents 
> where the text was recognize by OCR. This separation can allow logic based 
> on "good" text and leave the OCR text as an convenience. With text known to 
> be good things better text processing can be done. In the past things like 
> fuzzy matching and regular expressions have been suggested for metadata 
> suggestions like you mention and even workflow transitions and automatic 
> tagging.   That makes sense?
>

I would suggest another approach, because OCR can not be 100% reliable. 
Sorry if what I suggest is totally unrealistic, I have no idea of the 
eventual technical complexity of my suggestion.

It would be nice that the software "learns" about the documents. Meaning 
everytime a user assigns a value to a metadata (or confirms a suggested 
value), the software build its own "guess database".
Illustration with for example document type = bank statements :
- one metadata for the bank name : UBS, HSBC, CREDIT AGRICOLE
- one metadata for the related bank account : in the same bank (eg UBS), 
two accounts 05A45678AHF and 05A78948AHG
At the beginning, the software has no idea neither of the bank, nor of the 
account, so every metadata is assigned manually.
With time, the software starts to see similarities (eg thanks to OCR, but 
why not other tools recognizing picture, text placements, etc...) between 
all documents related to UBS, all documents related to bank account 
05A45678AHF, and builds a kind of template.
At a certain point, the model is strong enough : when a new "bank 
statement" enters the system, the software would use the templates and try 
to predict the metadata.
If there is a match between a template and the document, the software would 
assign the metadata, but with a question mark.
When a user visualizes the document it must be clear that the metada was 
"guessed", and there must be a one-click option (or keyboard shortcut) to 
confirm or not the metadata.
In case it is confirmed, the new document must become part of the model to 
increase the prediction for the next documents.

That kind of algorythm could even be extended to guess the document type, 
the label, the folder, etc...
 

>  
>
>>
>> - Autoincrement variables : that would be great (and a must) to link 
>> different document together. For example I have an order (let's say number 
>> 001) and I want to be able to link all the related documents to that order. 
>> I need to set somehow this 001 to a metadata in the other documents. This 
>> kind of variables could be assigned to a document type. Or "autoincrement" 
>> could be a property of a metadata for a given document type, and a choice 
>> list for the other document types. Another possibility would be to easily 
>> see the original document's uid.
>>
>
> As you mentioned in your reply the auto renaming app would solve this (
> https://gitlab.com/mayan-edms/document_renaming). Do you see value of 
> adding this as a permanent app into the core of Mayan?
>

Definitely yes.

Actually the original name of a document is - in my own experience - 
meaningless because it is assigned by the scanning machine : typically 
SKMBT_C22016050915170.pdf which tells it was obtained from a Konica Minolta 
C220 on 09th of May 2016 at 15:17.
The only information that could be retrieved from that is the date/time, 
and basically this information is also available in MAYAN because the 
document enters the EDMS at the same tyme it is scanned. Of course, this is 
true only if the main entry to the system is by scanning.


 
>
>>
>> - workflow is a good start with the possibility to give the status ; 
>> however I was not able to exploit this functionality at all (that was the 
>> subject of my recent post). At least one should be able to classify the 
>> document based on the state of the workflow via the indexes.
>>
>
> Workflows will be the main focus for the next development cycle (version 
> 2.2) so all suggestions and ideas are welcome and we are at a good time to 
> brainstorm on the kind of stuff to add. Checkout the roadmap wiki document 
> for an idea of stuff being considered.  
>  
>
>>
>> - I love the "Indexes" concept
>>
>
> Thank you! It is one the "power features" that sets Mayan apart :)
>  
>
>>
>> Hope that feedback can help to improve the product.
>>
>
> I does, thank you and keep it coming! 
>  
>
>>
>> Cheers,
>> Bruno
>>
>>
>>
>>
>>
>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Mayan EDMS: 1356] Re: So promissing !!!

Reply via email to