No, I had not seen that, thanks! Looks very interesting!
ajs6f Phil Coates wrote on 9/5/17 11:04 AM:
Have you looked at CM-Well (https://github.com/thomsonreuters/CM-Well)? This is based on Cassandra and ElasticSearch. *Philip Coates* [email protected] <mailto:[email protected]> [email protected] <mailto:[email protected]> skype:philip.coates.76 Tel: +44 (0)7711 818384 *SemanticIntegration* <http://www.semanticintegration.co.uk/> On 5 September 2017 at 15:40, <[email protected] <mailto:[email protected]>> wrote: The requirements for distributed storage are actually that DRAS-TIC (see that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so effectively, the requirement is that Cassandra be used, at least at core. So part of what I am wondering (if it's not obvious) is "If we're going to have a Cassandra cluster as part of this, how can we get as much mileage as possible out of it?" I know that Cassandra offers some ordering capabilities out-of-the-box, although I'm not familiar with them. Maybe they could be used to support merge join generally. CumulusRDF (as shown in that paper I forwarded) uses a structure in which they mostly leave column values empty. The information is stored entirely in the keys, and use is made of prefix lookup. Does your system do something like that, Claude? It sounds like you are storing tuple component in the column values. ajs6f Andy Seaborne wrote on 9/5/17 4:43 AM: On Mon, Sep 4, 2017 at 12:10 PM, <[email protected] <mailto:[email protected]>> wrote: Little of both? :grin: Primarily I am interested because of a grant [1] in which the Smithsonian Institution (where I work) is participating in a supporting role (partly because I convinced us to). That work involves using Cassandra for distributed storage, and it will also involve a distributed LDP implementation (the Fedora API referred to in that grant description is really just a packaging of Memento [2] with LDP [3]), hence my interest in jena-on-cassandra. Turning this round - what are the requirements for the distributed storage? As I understand the join question, the usual move with Cassandra is to denormalize and store the joined data together, but that's obviously nontrivial in our situation, where we don't know the potential queries. Have you looked at an indexing solution such as was used by CumulusRDF [4]? (single graph example) If Cassandra has stored PSO and POS then parallel merge joins are possible. Andy ajs6f [1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 <https://www.imls.gov/grants/awarded/lg-71-17-0159-17> [2] http://www.mementoweb.org/guide/quick-intro/ <http://www.mementoweb.org/guide/quick-intro/> [3] https://www.w3.org/TR/ldp/ [4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh <http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh> ops/SSWS/Ladwig-et-all-SSWS2011.pdf Claude Warren wrote on 9/2/17 12:44 PM: are you looking to use jena-on-cassandra or do you have ideas? what leads you to ask about it? On Sat, Sep 2, 2017 at 1:21 PM, <[email protected] <mailto:[email protected]>> wrote: Hey, Claude-- Just curious as to where https://github.com/Claudenw/jena-on-cassandra <https://github.com/Claudenw/jena-on-cassandra> has ended up. Is that still work-in-progress? -- ajs6f -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren <http://www.linkedin.com/in/claudewarren>
