Hi, Yes, I also agree on this one. I marked Peter's suggestions for 0.8.0 as a realistic timescale for adoption. It seems a sensible approach, should we get the phasing and engineering correct I think this will certainly add to the project.
Lewis On Sat, May 12, 2012 at 7:53 PM, Michele Mostarda <[email protected]> wrote: > On 12 May 2012 18:34, Mattmann, Chris A (388J) < > [email protected]> wrote: > >> Hi Peter, >> >> Thanks for your help and for a detailed explanation of what you did! >> >> I for one, would be super supportive if you had time to figure out a way >> to get it into Apache Any23. I'm sure the rest of the PPMC would be happy >> and willing to work with you to develop JIRA issues/patches, etc., to >> facilitate this. >> > > I would be happy! > +1 > > Mic > >> >> Thank you again for your work! >> > > Thanks. > Mic > > >> >> Cheers, >> Chris >> >> On May 10, 2012, at 8:41 PM, Peter Ansell wrote: >> >> > Hi all, >> > >> > Over the past two days I have split up Any23 into a variety of modules >> > to make it easier to use different parts of the Any23 API. You can see >> > the code at [1]. The current module list in the parent pom reactor >> > looks like: >> > >> > <modules> >> > <module>api</module> >> > <module>csvutils</module> >> > <module>encoding</module> >> > <module>mime</module> >> > <module>core</module> >> > <module>test-resources</module> >> > <module>extractor</module> >> > <module>cli</module> >> > <module>test</module> >> > <module>service</module> >> > <module>plugins/basic-crawler</module> >> > <module>plugins/html-scraper</module> >> > <module>plugins/office-scraper</module> >> > <module>plugins/integration-test</module> >> > <module>sources-dist</module> >> > </modules> >> > >> > All of the modules above core do not have dependencies on core, and >> > the core module only has a dependency on the api module. >> > >> > The api module mostly contains interfaces but it also contains factory >> > registries where they are fully Service Provider Interface (SPI) >> > driven (Any23PluginManager and WriterFactoryRegistry which I created >> > to alleviate the WriterRegistry hardcoding dependencies and >> > reflection/annotation code that isn't easy to extend outside of the >> > core library). The ExtractoryRegistry was too difficult to convert to >> > SPI just yet so I split it up into an interface and an implementation >> > (ExtractorRegistryImpl) with the interface in the API module and used >> > in some APIs where the singleton was previously used. These >> > registries, together with Rio RDFFormat for referencing RDF format >> > information, seemed to be enough to remove the hardcoding that I have >> > been discussing at https://issues.apache.org/jira/browse/ANY23-83 >> > >> > The changes fit my purposes as I can easily slot in the encoding and >> > mime detection code without pulling in the core or extractor modules, >> > and the supported types for the mime detection include any formats I >> > register with OpenRDF Rio so it is extensible and modular for my >> > purposes. >> > >> > However, most of the changes are too large for easy patching and I >> > didn't arrange the changes into nice patches throughout as I was not >> > sure what was going to happen in the end. I have submitted two very >> > small patches to that issue, but there could be many more eventually >> > if the redesigned code is acceptable. >> > >> > Note, I also removed the Any23 NQuads implementation as it was missing >> > Factory implementations for the writer and parser classes so it wasn't >> > being picked up by Rio.createParser or any of the other static Rio >> > methods. I replaced it with the NQuads implementation from Sesametools >> > which includes these factories and so is recognised. When >> > http://www.openrdf.org/issues/browse/SES-802 gets implemented both of >> > these implementations will likely be deprecated anyway so it wasn't a >> > major issue for me. I would suggest in either case splitting out the >> > NQuads classes into a separate module and implementing a Factory for >> > both the parser and writer so they are picked up by SPI. >> > >> > There were some existing broken tests when I started, and there were a >> > small number of tests that broke throughout, including one that broke >> > when I updated to Tika-1.1. They are temporarily ignored, but can be >> > found easily by checking the ignored tests when running the test >> > suite. >> > >> > I hope the changes are useful to others. >> > >> > If you want to suggest changes to my version on GitHub feel free to >> > open an issue or fork the repository and send a pull request back. >> > >> > Cheers, >> > >> > Peter >> > >> > [1] https://github.com/ansell/any23 >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> > > > -- > Michele Mostarda > Senior Software Engineer > skype: michele.mostarda > twitter: micmos > mail: [email protected] > site : http://www.michelemostarda.com -- Lewis
