I have just submitted an enhancement to Mayan-EDMS that not only allows you
to validate data, but to normalize data before storage. I think this is
EXACTLY what you were asking for. I have attached a description of the
change.
On Thursday, May 15, 2014 6:06:09 AM UTC-4, Mathias Behrle wrote:
>
> * Jason Heeris: " [Mayan EDMS: 692] Is there a way to enforce metadata
> structure for eg. dates?" (Thu, 15 May 2014 01:30:11 -0700 (PDT)):
>
> Hi Jason,
>
> > I've just installed Mayan EDMS 0.13 (using the VirtualBox .ova file),
> and
> > have been playing around with it a bit.
> >
> > I'm wondering whether I can enforce a particular structure for metadata
> > when it's entered. For example, one important piece of metadata is the
> date
> > a document was received (note that this is almost certainly different to
> > the date it is uploaded). At some point I will probably want to filter
> my
> > documents based on this information (I don't know if that's possible
> with
> > Mayan, but I'm hoping so).
> >
> > But it's possible I could enter "2014-01-01" for one document, and then
> a
> > few weeks later forget what format I used last time and enter "2/1/2014"
> > for another, and so on. This seems like the sort of mistake that
> structured
> > metadata is good at preventing. Another example might be currency
> metadata
> > ("$40" vs "10.15" vs "50c").
> >
> > Is this something I can do easily with Mayan?
>
> AFAIK there is no input validation for metadata until now. The only
> procedure I
> can recommend so far is to put current_date() in the standard field of the
> metadata type to show the correct input format.
>
> Cheers,
> Mathias
>
>
> --
>
> Mathias Behrle
> PGP/GnuPG key availabable from any keyserver, ID: 0x8405BBF6
>
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
Adds ability to validate and normalize metadata.
I felt that it would be very handy to be able to validate
user-supplied metadata. It occurred to me that if a metadata
type had an explicit list of options, it would need no validation.
Therefore, the "lookup" field of a metadata type could be overloaded
to provide EITHER a list of items that could be selected by the user
OR a function to provide data validation. The system, therefore,
would need to be able to discriminate between a lookup function
and a validation function.
To this end, I created a global variable
('METADATA_AVAILABLE_VALIDATORS') to contain a dictionary of
available validation functions. If the name specified in
'metadata_type.lookup' is present in METADATA_AVAILABLE_VALIDATORS,
the system treats the function as a validator. Otherwise, the
function is treated as a generator of an iterable value providing
the choices for the user.
Django contains a pre-existing mechanism to support field
validation. A validator has a single argument (the data to
be validated). If the argument to the validator is valid,
the validator simply returns. If there is a problem with
the data, the validator raises a 'ValidationError' exception
and passes an error message which is then displayed by Django
as a mouseover tip in the browser. Validators to be used
with Mayan-EDMS may follow this convention (i.e., take a
single argument and raise an exception if the validation
fails). The validators in Mayan-EDMS, however, may actually
do more!
If a validator function RETURNS a value, that value is used
in place of the original data. This allows the validator to
make data conform to a valid value or to "normalize" a value
before it is stored in the database. This allows for more
uniform metadata and improves the ability to index on the
metadata values. Lets take at a look at an example of this
functionality.
Assume that a document requires a date (perhaps, an
"original posting date"). We can have a 'metadata_type" of
"original_posting_date", and we can create a validator with
the name "is_valid_posting_date". The validator function
(which is placed in a module read by the settings routine),
contains the function:
def is_valid_posting_date(value):
from dateutil import parser
import datetime
from django.core.exceptions import ValidationError
try:
dt = parser.parse(value)
except ValueError:
raise ValidationError('Invalid date')
return dt.date().isoformat()
This is placed in a dictionary in the user's
settings file, thus:
import my_settings
METADATA_AVAILABLE_VALIDATORS = {
'is_valid_posting_date':my_settings.is_valid_posting_date }
The user creates a metadata type called "original_posting_date"
with a label of "Original Posting Date" and a 'lookup' value
of "is_valid_posting_date". When the metadata form is filled
in and submitted, the date value is validated by our validator.
Since the python 'parser' function accepts many kinds of input,
the user can enter (for example) '9/1/2014', '2014/10/2',
or even 'Feb 4, 2001'. If the user enters something that
does not (as far as python is concerned) represent a valid date,
the system will raise a "ValidationError" and the form will
be re-displayed with an appropriate error message. If, however,
the data is valid, the valid of the field (and, hence, stored
in the database) will be "normalized" to ISO format YYYY-MM-DD.
This allows consistent lookup and indexing regardless of the
users particular idiosyncracies.