Tyler Palsulich created TIKA-1420:
-------------------------------------

             Summary: Add Metadata Extraction to Arbitrary Parsers
                 Key: TIKA-1420
                 URL: https://issues.apache.org/jira/browse/TIKA-1420
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Tyler Palsulich
            Priority: Minor


Suppose you wish to extract information from arbitrary file types and add it to 
a Metadata Object. This type of task is best handled by a... Handler. But, 
Handlers do not have access to the Metadata Object passed to a Parser. 

So, I see a few ways we could do using existing functionality.

1) Make an intermediate XML representation of the desired metadata in a 
handler, then convert the XML to the Metadata after parsing. 

2) Create a second Parser which extracts the desired information.
     a) Assume the Handler passed to this Parser is already filled with 
content. So, we could simply get whatever content from the Handler and populate 
the Metadata directly.
     b) Create a new Stream in the first Parser to pass to the second, which in 
turn populates the Metadata.

None of these options seem ideal. Is there a better way to handle this 
scenario? Or, can we create some sort of... wrapper for a Handler which can 
accept a Metadata Object to populate directly? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to