Hey Nick! :) I'd have no problem pinching the code from Tika in Action. I wonder if the Manning folks would mind.
I'll reach out to them. Cheers, CHris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Nick Burch <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, August 7, 2014 2:42 PM To: "[email protected]" <[email protected]> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage >On Thu, 7 Aug 2014, Tyler Palsulich wrote: >> Sounds like the new module is a good idea. So, let's jump on it! I will >> create a new 'example' JIRA tag and create issues for creating the >> module and adding Parse, Detect, and Translate examples. Others should >> add issues/desired examples as they see fit. How's that sound? > >I wonder if it's worth approaching those crazy fools who wrote a book on >Tika, to see if we could pinch one or two of their examples? If only we >knew who they were... ;-) > > >Recursion is one that causes confusion, we've got some example programs >on >the wiki that we can include: >https://wiki.apache.org/tika/RecursiveMetadata > >Ray Gauss is probably our best bet for advanced metadata stuff to send in >some examples on that! > >Another one that has generated mailing list traffic lately is embedded >images, including re-writing links to them. There's some (LGPL) code in >Alfresco which I wrote a few years ago to do that, Ray might be able to >get the nod to contribute that (or a cut-down version) as an example of >that style of parsing html + embedded resources in parallel > >Nick
