Hi all, I have created a set of 3 related patches to split out the mime and nquads functionality into separate modules that do not depend on the core module. This will make it easier for people to reuse both of these modules. The main goal for the patch was to split out the mime module so that it can be used without pulling in everything from core, but that requires the nquads module to be split out at the same time to enable tests to be moved from core/src/test/java to mime/src/test/java.
Unfortunately there is a single class, CSVReaderBuilder, that doesn't seem to fit in the api module, and is required by the mime type detector to detect csv documents. I created a separate module called csvutils that has dependencies on apache commons csv and the api module. It needs to api module to pull in a DefaultConfiguration object. In addition, I removed the compile time dependency for the core module on the nquads module, so that it can be interchangeable with a Sesame Rio N-Quads module when the CLAs to Aduna/Vound are completed and I get time to implement it. I switched hard-dependencies to most of the Rio modules off where possible, however there were still two cases where there are hard dependencies. Firstly, the N-Quads implementation has a compile time dependency on the Sesame Rio NTriples classes. This does not cause a hard-dependency on ntriples, as the nquads module itself is a runtime dependency for core and mime. The only hard-dependency on a parser module for core is through a custom TurtleParser extension class that is used in RDFParserFactory to set the base prefix to the baseURI when parsing, and it is not apparent how that could be fixed, as there is no "on parse start" hook for RDFParser. These changes resulted in a large number of minor changes to standardise references to Rio RDFParser using Rio.createParser whereever possible. These include some references to RDFParserBase which is an implementation class and not in the basic OpenRDF Rio API. The changes also matche the Any23 preferred mime type for N-Quads with the mime type given in the initial specification, ie, "text/x-nquads". The alternative mime types are still supported, but the Tika configuration now returns text/x-nquads if it is given one of the aliases. In addition, the patch also switches the tika turtle mime type to the type contained in the W3C Team Submission, "text/turtle". As with N-Quads, the Tika configuration for Turtle still contains the alternative mime type "application/x-turtle", but should now return "text/turtle" instead of the alias. There were a large number of places throughout the Any23 codebase where these mime types were hardcoded, so to reduce that number to make things manageable I switched to using RDFFormat.getDefaultMimeType(), and RDFFormat.hasMimeType, which checks against both the default and any alternative mime types defined in the RDFFormat that is being referenced. One other change that may affect some operations is the switch in NQuadsParser from using the default user locale to define the charset for InputStream's to explicitly use "UTF-8", which may have been what was desired in the past anyway. I also switched the order of parsing in NQuadsParser to avoid using importing the custom Any23 ReaderInputStream class by instead using the standard Java InputStreamReader class to focus the parse process on Reader instead of InputStream. You can review the patches at: https://github.com/ansell/any23/compare/ansell:trunk...ansell:mime-module Either commenting inline at GitHub or here on the mailing list is fine with me. The patches relate to three ANY23 Jira Issues: * ANY23-85 : Splitting out the NQuads parser and writer into its own module * ANY23-117 : Split out mime type detection into its own module * ANY23-83 : Removing hardcoded formats to make Any23 more flexible as a modular library Although the branch contains three independent patches, I did not create them initially that way, so they may contain bugs if you test them individually. In particular, there are references to csvutils and mime modules in the nquads patch. If necessary I could further refactor them, but if all three are okay I will submit them all at the same time. Cheers, Peter
