Hi everyone! My name is Brad and I'm based in Australia. I've been developing Rya for a few months now full-time as part of a comprehensive evaluation of Semantic Web technologies and in particular Rya, for our organisation. We're experienced users of Accumulo. We had some policy issues to overcome in regards to contributing to the Apache project but that is now resolved. I've been in contact with Adina along the way.
Rya seems pretty awesome, but it is held back by a lack of documentation, some unclean code and a few rough edges to getting started. For example, we could hook it up with Fluo Muchos to make it super easy for new people to spin up a working Rya cluster on an AWS or Azure cloud. My impression of Rya is that it is quite feature complete, but needs some work to be much more friendly to new adopters. I put up a pull request last week that updated the maven dependencies of the project. Any help reviewing that would be appreciated. I know you're all busy so there is no great rush, but I'd love to collaborate and hear your priorities too. I'm about 70 commits deep into my work on Rya in our organisation's code repository, so I've been pretty busy. I'm now trying to finalise some changes. I've been testing the performance against the original code in a small test cluster, and for some queries I've made Rya much faster, and for others, slower. I'm working on more changes which I think should improve it further. I've started testing against the LUBM 5000 dataset, DBPedia and OpenPermID. I'm new to the world of Semantic Web but fortunately I have some experienced colleagues helping me along the way. I've been marking tickets in Jira as a work on them, and I'm trying to publish my pull requests onto GitHub faster. Hopefully a bunch will start appearing soon. Please expect a large pull request soon that changes Rya to use data types that align better with RDF4J, but otherwise doesn't change functionality. I have a refactor of the Accumulo DAO that is cleaner and (once finished hopefully much) faster. I have fixed a number of other tickets and improved some of the doco and configuration files. I'll try to make the pull requests clean and reviewable, but unfortunately many of the improvements I'm making depend on other improvements I've made, so its a bit tricky to disentangle. Some improvements I'll be putting up shortly also include: Enhance accumulo.rya to support the use of bloom filter Make timeout for SPARQL query configurable Add an IPAddressRyaTypeResolver NumberFormatException for large integers Tomcat configuration for indexers etc If anyone with more Rya experience wants to request particular features or functionality to be worked on, I've love to heard from you. We're particularly interested in scaling Rya to very large data sets (thus performance is very important to us) and making Rya more generic in reading from other (pre-existing) Accumulo table layouts. I also want to fix reliability issues around indexing configuration and consistency of tables (for example, is there a mapreduce job that repairs the indexes if data is written from a misconfigured client?). I hope to hear from you, and your thoughts on the future directions of Rya. Brad