On Thu, 7 Aug 2014, Tyler Palsulich wrote:
Sounds like the new module is a good idea. So, let's jump on it! I will create a new 'example' JIRA tag and create issues for creating the module and adding Parse, Detect, and Translate examples. Others should add issues/desired examples as they see fit. How's that sound?

I wonder if it's worth approaching those crazy fools who wrote a book on Tika, to see if we could pinch one or two of their examples? If only we knew who they were... ;-)


Recursion is one that causes confusion, we've got some example programs on the wiki that we can include:
https://wiki.apache.org/tika/RecursiveMetadata

Ray Gauss is probably our best bet for advanced metadata stuff to send in some examples on that!

Another one that has generated mailing list traffic lately is embedded images, including re-writing links to them. There's some (LGPL) code in Alfresco which I wrote a few years ago to do that, Ray might be able to get the nod to contribute that (or a cut-down version) as an example of that style of parsing html + embedded resources in parallel

Nick

Reply via email to