Chris A. Mattmann created TIKA-1441:
---------------------------------------
Summary: ExternalParsers should allow dynamic keys to be specified
for Regexs
Key: TIKA-1441
URL: https://issues.apache.org/jira/browse/TIKA-1441
Project: Tika
Issue Type: Bug
Components: parser
Environment: while working on TIKA-605 and memex
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Fix For: 1.7
While working on TIKA-605, I was trying to use ExternalParsers and I came
across an interesting use case. What if there are so many met keys that
specifying all of them by hand as individual regexs would be repetitive, and
tedious. What if the met key itself could also be specified by a regex, e.g.,
we just take the first group to be the key, and then the next group would be
the actual value? I ran across this in parsing GDAL output and so a very simple
improvement to the ExternalParsers Map<Pattern, String> map would be to allow
it to take e.g., null or "" Strings and then take that to mean that the Pattern
specifies *both* the key name *and* the key value.
I've got a patch I'll upload all tests pass and I need this to get TIKA-605 in
and done.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)