On Thu, 7 Aug 2014, Tyler Palsulich wrote:
Sounds like the new module is a good idea. So, let's jump on it! I will
create a new 'example' JIRA tag and create issues for creating the
module and adding Parse, Detect, and Translate examples. Others should
add issues/desired examples as they see fit. How's that sound?
I wonder if it's worth approaching those crazy fools who wrote a book on
Tika, to see if we could pinch one or two of their examples? If only we
knew who they were... ;-)
Recursion is one that causes confusion, we've got some example programs on
the wiki that we can include:
https://wiki.apache.org/tika/RecursiveMetadata
Ray Gauss is probably our best bet for advanced metadata stuff to send in
some examples on that!
Another one that has generated mailing list traffic lately is embedded
images, including re-writing links to them. There's some (LGPL) code in
Alfresco which I wrote a few years ago to do that, Ray might be able to
get the nod to contribute that (or a cut-down version) as an example of
that style of parsing html + embedded resources in parallel
Nick