Re: Splitting up Any23 into a more modular format

Lewis John Mcgibbney Sat, 12 May 2012 13:36:01 -0700

Hi,

Yes, I also agree on this one. I marked Peter's suggestions for 0.8.0
as a realistic timescale for adoption. It seems a sensible approach,
should we get the phasing and engineering correct I think this will
certainly add to the project.


Lewis

On Sat, May 12, 2012 at 7:53 PM, Michele Mostarda
<[email protected]> wrote:
> On 12 May 2012 18:34, Mattmann, Chris A (388J) <
> [email protected]> wrote:
>
>> Hi Peter,
>>
>> Thanks for your help and for a detailed explanation of what you did!
>>
>> I for one, would be super supportive if you had time to figure out a way
>> to get it into Apache Any23. I'm sure the rest of the PPMC would be happy
>> and willing to work with you to develop JIRA issues/patches, etc., to
>> facilitate this.
>>
>
> I would be happy!
> +1
>
> Mic
>
>>
>> Thank you again for your work!
>>
>
> Thanks.
> Mic
>
>
>>
>> Cheers,
>> Chris
>>
>> On May 10, 2012, at 8:41 PM, Peter Ansell wrote:
>>
>> > Hi all,
>> >
>> > Over the past two days I have split up Any23 into a variety of modules
>> > to make it easier to use different parts of the Any23 API. You can see
>> > the code at [1]. The current module list in the parent pom reactor
>> > looks like:
>> >
>> >  <modules>
>> >    <module>api</module>
>> >    <module>csvutils</module>
>> >    <module>encoding</module>
>> >    <module>mime</module>
>> >    <module>core</module>
>> >    <module>test-resources</module>
>> >    <module>extractor</module>
>> >    <module>cli</module>
>> >    <module>test</module>
>> >    <module>service</module>
>> >    <module>plugins/basic-crawler</module>
>> >    <module>plugins/html-scraper</module>
>> >    <module>plugins/office-scraper</module>
>> >    <module>plugins/integration-test</module>
>> >    <module>sources-dist</module>
>> >  </modules>
>> >
>> > All of the modules above core do not have dependencies on core, and
>> > the core module only has a dependency on the api module.
>> >
>> > The api module mostly contains interfaces but it also contains factory
>> > registries where they are fully Service Provider Interface (SPI)
>> > driven (Any23PluginManager and WriterFactoryRegistry which I created
>> > to alleviate the WriterRegistry hardcoding dependencies and
>> > reflection/annotation code that isn't easy to extend outside of the
>> > core library). The ExtractoryRegistry was too difficult to convert to
>> > SPI just yet so I split it up into an interface and an implementation
>> > (ExtractorRegistryImpl) with the interface in the API module and used
>> > in some APIs where the singleton was previously used. These
>> > registries, together with Rio RDFFormat for referencing RDF format
>> > information, seemed to be enough to remove the hardcoding that I have
>> > been discussing at https://issues.apache.org/jira/browse/ANY23-83
>> >
>> > The changes fit my purposes as I can easily slot in the encoding and
>> > mime detection code without pulling in the core or extractor modules,
>> > and the supported types for the mime detection include any formats I
>> > register with OpenRDF Rio so it is extensible and modular for my
>> > purposes.
>> >
>> > However, most of the changes are too large for easy patching and I
>> > didn't arrange the changes into nice patches throughout as I was not
>> > sure what was going to happen in the end. I have submitted two very
>> > small patches to that issue, but there could be many more eventually
>> > if the redesigned code is acceptable.
>> >
>> > Note, I also removed the Any23 NQuads implementation as it was missing
>> > Factory implementations for the writer and parser classes so it wasn't
>> > being picked up by Rio.createParser or any of the other static Rio
>> > methods. I replaced it with the NQuads implementation from Sesametools
>> > which includes these factories and so is recognised. When
>> > http://www.openrdf.org/issues/browse/SES-802 gets implemented both of
>> > these implementations will likely be deprecated anyway so it wasn't a
>> > major issue for me. I would suggest in either case splitting out the
>> > NQuads classes into a separate module and implementing a Factory for
>> > both the parser and writer so they are picked up by SPI.
>> >
>> > There were some existing broken tests when I started, and there were a
>> > small number of tests that broke throughout, including one that broke
>> > when I updated to Tika-1.1. They are temporarily ignored, but can be
>> > found easily by checking the ignored tests when running the test
>> > suite.
>> >
>> > I hope the changes are useful to others.
>> >
>> > If you want to suggest changes to my version on GitHub feel free to
>> > open an issue or fork the repository and send a pull request back.
>> >
>> > Cheers,
>> >
>> > Peter
>> >
>> > [1] https://github.com/ansell/any23
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [email protected]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
> --
> Michele Mostarda
> Senior Software Engineer
> skype: michele.mostarda
> twitter: micmos
> mail: [email protected]
> site : http://www.michelemostarda.com



-- 
Lewis

Re: Splitting up Any23 into a more modular format

Reply via email to