Hi Simone, I am on holidays right now. Will be able to look at this and the extractors further after I get back. Feel free to look at branches on my GitHub repository if you want to clean up what I have done. I am stuck on the Tika-1.2 update branch, as I have no idea why xerces fails to parse the document that is failing, but the other branches are still not quite feature complete.
Cheers, Peter On 30 December 2012 00:57, Simone Tripodi <[email protected]> wrote: > Hi Peter, > > do you have any progress on it? :) > > TIA, all the best! > -Simo > > http://people.apache.org/~simonetripodi/ > http://simonetripodi.livejournal.com/ > http://twitter.com/simonetripodi > http://www.99soft.org/ > > > On Wed, Sep 19, 2012 at 10:08 PM, Peter Ansell <[email protected]> wrote: >> Hi Simone, >> >> In my github repository I had split out the CLI and the extractors in >> addition to what has already been split out. The extractors will be >> harder than the CLI to split out from my experience. If you want to >> look at what my old set of patches resulted in, you can see them at >> [1]. The actual git commits won't be much use as I have not rebased >> them on the current trunk, but you could browse the code to see which >> classes ended up where. >> >> The main reason that I went through the splitting process was that I >> wanted to reduce the number of dependencies for my projects that are >> reusing the mime and encoding modules. The dependency on tika is much >> clearer in that case, and then it is just a case of using maven to >> prune off the tika dependencies that are not needed for each project. >> >> Once we get to the point where there are not any more logical modules, >> or they wouldn't be useful on their own, we may want to rename the >> core module to "utils" or something like that. >> >> I am going on holidays for a few weeks so feel free to start on >> changes in the meantime. >> >> Thanks for fixing up the formatting and license headers by the way. I >> will try to be more vigilant in the future to get it right the first >> time. >> >> Cheers, >> >> Peter >> >> [1] https://github.com/ansell/any23/tree/oldpatches >> >> On 20 September 2012 01:05, Simone Tripodi <[email protected]> wrote: >>> Hi all guys, >>> >>> Peter did a terrific work, I had the chance to review it (just added >>> missing headers and other minor stuff) but it is perfectly clear the >>> effort he invested on it, and having a better modularization helps >>> IMHO a lot on understanding any23 internals. Kudos, Peter! :) >>> >>> It came in my mind to extract the CLI tools implementation from the >>> core and putting them in a proper module, in order to have a better >>> separation of concerns - and when embedding APIs, users don't need to >>> bring those classes. >>> >>> WDYT? Any objection? >>> >>> TIA, >>> -Simo >>> >>> http://people.apache.org/~simonetripodi/ >>> http://simonetripodi.livejournal.com/ >>> http://twitter.com/simonetripodi >>> http://www.99soft.org/
