Hi everyone,

I've created a draft PR:
https://github.com/apache/rya/pull/317

I'm after some background, opinions and feedback about how to improve 
the environment.properties space in Rya.

I'm not really clear where this should end up, so please let me know 
your thoughts.

Brad


On 6/07/2020 7:12 pm, Brad Rushworth wrote:
> Hi everyone!
>
> My name is Brad and I'm based in Australia. I've been developing Rya for
> a few months now full-time as part of a comprehensive evaluation of
> Semantic Web technologies and in particular Rya, for our organisation.
> We're experienced users of Accumulo. We had some policy issues to
> overcome in regards to contributing to the Apache project but that is
> now resolved. I've been in contact with Adina along the way.
>
> Rya seems pretty awesome, but it is held back by a lack of
> documentation, some unclean code and a few rough edges to getting
> started. For example, we could hook it up with Fluo Muchos to make it
> super easy for new people to spin up a working Rya cluster on an AWS or
> Azure cloud. My impression of Rya is that it is quite feature complete,
> but needs some work to be much more friendly to new adopters.
>
> I put up a pull request last week that updated the maven dependencies of
> the project. Any help reviewing that would be appreciated. I know you're
> all busy so there is no great rush, but I'd love to collaborate and hear
> your priorities too.
>
> I'm about 70 commits deep into my work on Rya in our organisation's code
> repository, so I've been pretty busy. I'm now trying to finalise some
> changes. I've been testing the performance against the original code in
> a small test cluster, and for some queries I've made Rya much faster,
> and for others, slower. I'm working on more changes which I think should
> improve it further. I've started testing against the LUBM 5000 dataset,
> DBPedia and OpenPermID.
>
> I'm new to the world of Semantic Web but fortunately I have some
> experienced colleagues helping me along the way. I've been marking
> tickets in Jira as a work on them, and I'm trying to publish my pull
> requests onto GitHub faster. Hopefully a bunch will start appearing soon.
>
> Please expect a large pull request soon that changes Rya to use data
> types that align better with RDF4J, but otherwise doesn't change
> functionality. I have a refactor of the Accumulo DAO that is cleaner and
> (once finished hopefully much) faster. I have fixed a number of other
> tickets and improved some of the doco and configuration files. I'll try
> to make the pull requests clean and reviewable, but unfortunately many
> of the improvements I'm making depend on other improvements I've made,
> so its a bit tricky to disentangle.
>
> Some improvements I'll be putting up shortly also include:
> Enhance accumulo.rya to support the use of bloom filter
> Make timeout for SPARQL query configurable
> Add an IPAddressRyaTypeResolver
> NumberFormatException for large integers
> Tomcat configuration for indexers
> etc
>
> If anyone with more Rya experience wants to request particular features
> or functionality to be worked on, I've love to heard from you. We're
> particularly interested in scaling Rya to very large data sets (thus
> performance is very important to us) and making Rya more generic in
> reading from other (pre-existing) Accumulo table layouts. I also want to
> fix reliability issues around indexing configuration and consistency of
> tables (for example, is there a mapreduce job that repairs the indexes
> if data is written from a misconfigured client?).
>
> I hope to hear from you, and your thoughts on the future directions of Rya.
>
> Brad
>

Reply via email to