On 12 May 2012 18:34, Mattmann, Chris A (388J) < [email protected]> wrote:
> Hi Peter, > > Thanks for your help and for a detailed explanation of what you did! > > I for one, would be super supportive if you had time to figure out a way > to get it into Apache Any23. I'm sure the rest of the PPMC would be happy > and willing to work with you to develop JIRA issues/patches, etc., to > facilitate this. > I would be happy! +1 Mic > > Thank you again for your work! > Thanks. Mic > > Cheers, > Chris > > On May 10, 2012, at 8:41 PM, Peter Ansell wrote: > > > Hi all, > > > > Over the past two days I have split up Any23 into a variety of modules > > to make it easier to use different parts of the Any23 API. You can see > > the code at [1]. The current module list in the parent pom reactor > > looks like: > > > > <modules> > > <module>api</module> > > <module>csvutils</module> > > <module>encoding</module> > > <module>mime</module> > > <module>core</module> > > <module>test-resources</module> > > <module>extractor</module> > > <module>cli</module> > > <module>test</module> > > <module>service</module> > > <module>plugins/basic-crawler</module> > > <module>plugins/html-scraper</module> > > <module>plugins/office-scraper</module> > > <module>plugins/integration-test</module> > > <module>sources-dist</module> > > </modules> > > > > All of the modules above core do not have dependencies on core, and > > the core module only has a dependency on the api module. > > > > The api module mostly contains interfaces but it also contains factory > > registries where they are fully Service Provider Interface (SPI) > > driven (Any23PluginManager and WriterFactoryRegistry which I created > > to alleviate the WriterRegistry hardcoding dependencies and > > reflection/annotation code that isn't easy to extend outside of the > > core library). The ExtractoryRegistry was too difficult to convert to > > SPI just yet so I split it up into an interface and an implementation > > (ExtractorRegistryImpl) with the interface in the API module and used > > in some APIs where the singleton was previously used. These > > registries, together with Rio RDFFormat for referencing RDF format > > information, seemed to be enough to remove the hardcoding that I have > > been discussing at https://issues.apache.org/jira/browse/ANY23-83 > > > > The changes fit my purposes as I can easily slot in the encoding and > > mime detection code without pulling in the core or extractor modules, > > and the supported types for the mime detection include any formats I > > register with OpenRDF Rio so it is extensible and modular for my > > purposes. > > > > However, most of the changes are too large for easy patching and I > > didn't arrange the changes into nice patches throughout as I was not > > sure what was going to happen in the end. I have submitted two very > > small patches to that issue, but there could be many more eventually > > if the redesigned code is acceptable. > > > > Note, I also removed the Any23 NQuads implementation as it was missing > > Factory implementations for the writer and parser classes so it wasn't > > being picked up by Rio.createParser or any of the other static Rio > > methods. I replaced it with the NQuads implementation from Sesametools > > which includes these factories and so is recognised. When > > http://www.openrdf.org/issues/browse/SES-802 gets implemented both of > > these implementations will likely be deprecated anyway so it wasn't a > > major issue for me. I would suggest in either case splitting out the > > NQuads classes into a separate module and implementing a Factory for > > both the parser and writer so they are picked up by SPI. > > > > There were some existing broken tests when I started, and there were a > > small number of tests that broke throughout, including one that broke > > when I updated to Tika-1.1. They are temporarily ignored, but can be > > found easily by checking the ignored tests when running the test > > suite. > > > > I hope the changes are useful to others. > > > > If you want to suggest changes to my version on GitHub feel free to > > open an issue or fork the repository and send a pull request back. > > > > Cheers, > > > > Peter > > > > [1] https://github.com/ansell/any23 > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- Michele Mostarda Senior Software Engineer skype: michele.mostarda twitter: micmos mail: [email protected] site : http://www.michelemostarda.com
