My thanks Peter for the contribution

I am outside the technical work but this seems like a move that can
only increase the overall usefulness.
I hope this will be considered seriously but the others here.

Gio

On Fri, May 11, 2012 at 11:21 AM, Lewis John Mcgibbney
<[email protected]> wrote:
> Hi Peter,
>
> As I said on the issue at [1], this looks like exciting work. I'm
> hoping that is sparks some conversation amongst us.
>
> As we are pushing for our first incubating release I'm not entirely
> sure that the restructuring is a viable option just now, however we
> should certainly not rule it out unless there is a justified argument.
>
> Thank you for the heads up on this.
>
> Lewis
>
> On Fri, May 11, 2012 at 7:41 AM, Peter Ansell <[email protected]> wrote:
>> Hi all,
>>
>> Over the past two days I have split up Any23 into a variety of modules
>> to make it easier to use different parts of the Any23 API. You can see
>> the code at [1]. The current module list in the parent pom reactor
>> looks like:
>>
>>  <modules>
>>    <module>api</module>
>>    <module>csvutils</module>
>>    <module>encoding</module>
>>    <module>mime</module>
>>    <module>core</module>
>>    <module>test-resources</module>
>>    <module>extractor</module>
>>    <module>cli</module>
>>    <module>test</module>
>>    <module>service</module>
>>    <module>plugins/basic-crawler</module>
>>    <module>plugins/html-scraper</module>
>>    <module>plugins/office-scraper</module>
>>    <module>plugins/integration-test</module>
>>    <module>sources-dist</module>
>>  </modules>
>>
>> All of the modules above core do not have dependencies on core, and
>> the core module only has a dependency on the api module.
>>
>> The api module mostly contains interfaces but it also contains factory
>> registries where they are fully Service Provider Interface (SPI)
>> driven (Any23PluginManager and WriterFactoryRegistry which I created
>> to alleviate the WriterRegistry hardcoding dependencies and
>> reflection/annotation code that isn't easy to extend outside of the
>> core library). The ExtractoryRegistry was too difficult to convert to
>> SPI just yet so I split it up into an interface and an implementation
>> (ExtractorRegistryImpl) with the interface in the API module and used
>> in some APIs where the singleton was previously used. These
>> registries, together with Rio RDFFormat for referencing RDF format
>> information, seemed to be enough to remove the hardcoding that I have
>> been discussing at https://issues.apache.org/jira/browse/ANY23-83
>>
>> The changes fit my purposes as I can easily slot in the encoding and
>> mime detection code without pulling in the core or extractor modules,
>> and the supported types for the mime detection include any formats I
>> register with OpenRDF Rio so it is extensible and modular for my
>> purposes.
>>
>> However, most of the changes are too large for easy patching and I
>> didn't arrange the changes into nice patches throughout as I was not
>> sure what was going to happen in the end. I have submitted two very
>> small patches to that issue, but there could be many more eventually
>> if the redesigned code is acceptable.
>>
>> Note, I also removed the Any23 NQuads implementation as it was missing
>> Factory implementations for the writer and parser classes so it wasn't
>> being picked up by Rio.createParser or any of the other static Rio
>> methods. I replaced it with the NQuads implementation from Sesametools
>> which includes these factories and so is recognised. When
>> http://www.openrdf.org/issues/browse/SES-802 gets implemented both of
>> these implementations will likely be deprecated anyway so it wasn't a
>> major issue for me. I would suggest in either case splitting out the
>> NQuads classes into a separate module and implementing a Factory for
>> both the parser and writer so they are picked up by SPI.
>>
>> There were some existing broken tests when I started, and there were a
>> small number of tests that broke throughout, including one that broke
>> when I updated to Tika-1.1. They are temporarily ignored, but can be
>> found easily by checking the ignored tests when running the test
>> suite.
>>
>> I hope the changes are useful to others.
>>
>> If you want to suggest changes to my version on GitHub feel free to
>> open an issue or fork the repository and send a pull request back.
>>
>> Cheers,
>>
>> Peter
>>
>> [1] https://github.com/ansell/any23
>
>
>
> --
> Lewis

Reply via email to