Hey all, I finally have time to experiment with Mayan-EDMS some more. So I'm back at trying to get https://gitlab.com/startmat/document_analyzer working the way I want.
Unfortunately, I can't seem to figure it out. I'm currently testing on a vagrant instance. See: https://gitlab.com/mayan-edms/mayan-edms-vagrant I ended up copying the document_analyzer app into the apps directory to get it loading. I am using an Albertsons receipt to test with. The first two lines of OCR look like: 4S Albertsons > It's just better. > I made an analyzer and assigned the 'receipt' document type to it. (That's the type I added and that the albertsons receipt's properties page says it is.) Parameter: first;(?ims)(?P<albertsons>(.*Albertsons.*)) This should cause document_analyzer to add a "albertsons" field to either the metadata or properties of the document. Am I wrong? I also made an analyzer based on the document_analyzer's README. Parameter: first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons) I just added "Albertsons" to list of words to look for. This should cause document_analyzer to add a "Creator" field to either the metadata or properties of the document. Am I wrong? I used the menu item "Submit to analyze" http://localhost:8080/document_analyzer/analyzer/1/submit/ to run document_analyzer. All I can see in the logs is that I clicked that menu item. The document's properties and metadata do not change. Nothing is added to either the metadata or properties of the document. If I test: (?ims).*albertsons.* on http://www.pyregex.com/ with the first two lines of the document, it reports a success. /usr/share/mayan-edms/mayan/settings/local.py looks like: from __future__ import absolute_import, unicode_literals from .base import * SECRET_KEY = '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+' DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'mayan_edms', 'USER': 'mayan', 'PASSWORD': 'test123', 'HOST': 'localhost', 'PORT': '5432', } } INSTALLED_APPS += ( 'document_analyzer', ) BROKER_URL = 'redis://127.0.0.1:6379/0' CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/0' LOGGING = { 'version': 1, 'disable_existing_loggers': True, 'formatters': { 'verbose': { 'format': '%(levelname)s %(asctime)s %(name)s %(process)d %(thread)d %(message)s' }, 'intermediate': { 'format': '%(name)s <%(process)d> [%(levelname)s] "%(funcName)s() %(message)s"' }, 'simple': { 'format': '%(levelname)s %(message)s' }, }, 'handlers': { 'console':{ 'level':'DEBUG', 'class':'logging.StreamHandler', 'formatter': 'intermediate' } }, 'loggers': { #'documents': { # 'handlers':['console'], # 'propagate': True, # 'level':'DEBUG', #}, #'common': { # 'handlers':['console'], # 'propagate': True, # 'level':'DEBUG', #}, 'document_analyzer': { 'handlers':['console'], 'propagate': True, 'level':'DEBUG', }, } } Does anyone have any tips? Am I missing a step somewhere? -- --- You received this message because you are subscribed to the Google Groups "Mayan EDMS" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
