Hey All So as people have probably seen the IP Clearance for the Hadoop RDF Tools donation is now completed so we can now officially adopt the code and start developing it further. This email is to get community input on what should be a priority and any final clean up tasks needed.
Firstly in terms of clean up I think there are two things to do: * Remove @author tags in Javadoc (this is a no-brainer and I'll do this later today) * Rename the libraries Re: renaming - I am concerned that by calling these libraries Hadoop RDF Tools we are falling foul of the ASF trademarks policy. In order to properly comply with the policy it would be best to rename the libraries as "Apache Jena RDF Tools for Apache Hadoop" thus making it clear that it is the Apache Jena project responsible for them and that they target Apache Hadoop. Does this naming make sense? Alongside this I think we should also add the jena- prefix to all the Maven artefact IDs e.g. hadoop-rdf-common -> jena-hadoop-rdf-common again making it clear that these artifacts are from the Jena project. Though the org.apache.jena group ID mostly serves that purpose it won't really hurt to be thorough in this regard. Secondly there is the issue of what's next development wise. Those who have read the presentation attached to the associated JIRA (JENA-666) will know that there was a bunch of future work enumerated in that document based on Cray's thinking about the project but now this is part of Apache Jena the future direction should be driven by community needs. The main directions for future work that I personally am considering right now are as follows: 1. Clean up and fixes to existing code base 2. Native node and tuple containers i.e. Paul Houle's suggestion about storing things as native types wherever possible and lazily translating to Node/Triple/Quad etc only when necessary 3. Using binary comparisons wherever possible to avoid deserialisation costs and boost performance 4. Improving configuration e.g. specifying namespaces to use for output I'd like to know what people in the community considering using these libraries would like to see? Are there obvious things I've not thought of? Would you prioritise the above items differently? Any other thoughts? Like all parts of Jena this should be owned and maintained by the community as much as is possible, if people have things they'd like to have a go at implementing themselves then please go ahead. Feel free to drop emails to this list with questions, thoughts, requests for help etc Cheers, Rob
