We don't have code at the moment. We (the team I am on at work) are planning on implementing on Cassandra. That would mean that we would have a couple of developers watching and at least one working on the code until it was stable.
I was hoping that we would be able to contribute this to the jena project as a complete module. I understand not wanting to put it in as part of the project at the beginning, but that was my goal. I don't have a release schedule in mind as the in house project is still fluid. It might make sense to put it on github to start, but I would like to see it in a Jena based repo in order to make it more visible to the development community. As I keep saying, I need to get final approval from legal before proceeding. I expect to hear something later this week. Claude On Mon, Oct 31, 2016 at 5:53 PM, Andy Seaborne <a...@apache.org> wrote: > > > On 31/10/16 13:41, Claude Warren wrote: > >> Andy, >> >> This seems like a good approach but does not appear to be in the Jena code >> base, which I suppose is your comment about an approach to developing >> work. >> >> Does it make sense to create git clones that contain the new work? Or >> perhaps branches? >> >> Do you have a suggestion or direction you would like to see this go? >> > > That's the discussion to have. The first item is "Community". This is > all new code? Who is involved? Just you so far? > > A storage layer is not trivial - this is not an "extra" thing. It is a > module of it's own, and if the community is significantly different, maybe > a different different mailing lists (e.g. solr within the the Lucene > project), maybe even a different project; it can be "straight to TLP" or > "incubated" - that depends on who is involved. There are a wide set of > possibilities. > > If it is starting off, then the Jena git repo isn't a good place to have > the code. The lifecycles don't line up. > > A branch that is complete separate is really a separate repo. Jena can > get another git repo. > > What would be the release cycle? > The real issue is the work needed by the PMC for releases. > > To get all options mentioned: > > If this is a one-person effort for now, then starting a github repo and > creating the initial sketch/framework is an option. More focused. More > freedom to try things out and change directions. > > Andy > > > >> Claude >> >> >> >> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <a...@apache.org> wrote: >> >> Claude, >>> >>> These may help: >>> >>> I have been thinking about an interface that is more oriented to the >>> storage than the full DatasetGraph. >>> >>> StorageRDF breaks down all the operations into those on the default graph >>> and those on named graphs. For just a graph, simply ignore the named >>> graph >>> operations. >>> >>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro >>> jects/dsg2/storage/StorageRDF.java >>> >>> There is an adapter to the DatasetGraph hierarchy (which is needed for >>> SPARQL): >>> >>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro >>> jects/dsg2/DatasetGraphStorage.java >>> >>> If you want to only use existing classes, DatasetGraphTriplesQuads is the >>> place to start - used by TIM and TDB - yuo can implement without needing >>> quads/named graphs. Again, simply ignore (throw >>> UnsupportedOperationException for the named graph calls). >>> >>> Going the graph route could lead to rework later on for any kind of >>> performance issues because find(S,P,O) is so narrow and precludes union >>> default graph except by brute force. DatasetGraph work with the SPARQL >>> execution engine. >>> >>> We still need to discuss how best to approach developing work - it should >>> not get sucked up by the release cycle. >>> >>> Andy >>> >>> >>> On 26/10/16 19:21, Claude Warren wrote: >>> >>> My plan is to start with a Graph implementation. We expect to write 3 >>>> tables: SPO, POS, OPS (I think). Currently we don't have an easy way to >>>> handle find( ANY, ANY, ANY) so I suspect we will just start with >>>> permitting >>>> a column scan on Cassandra. >>>> >>>> I have not looked at DynamoDB but as I recall there are significant >>>> differences under the hood. >>>> >>>> I expect that we will move on to a custom model or query engine to get >>>> the >>>> best performance but that is not what we are planning for the first cut. >>>> >>>> I am still waiting for management approval to do this at work .... >>>> sometimes it takes longer to get the paperwork done than it does to >>>> design >>>> the thing. >>>> >>>> >>>> Claude >>>> >>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.ho...@ontology2.com> >>>> wrote: >>>> >>>> I like DynamoDB as a target for this sort of thing. There are many >>>> >>>>> tasks which are small-scale yet critical where it would otherwise be >>>>> hard to provide a distributed and reliable database. Put that together >>>>> with Lambda, which does the same for computation, and you are cooking >>>>> with gas. >>>>> >>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use >>>>> throughout an application; the code is DynamoDB idiomatic in every >>>>> way, >>>>> just the application reads and writes (a constrained set of) RDF >>>>> documents. >>>>> >>>>> Right now I dump the documents from the DynamoDB system into a triple >>>>> store when I want a panoptic view, but with a distributed graph like >>>>> that would mean being able to run SPARQL queries against DynamoDB >>>>> directly. >>>>> >>>>> There are many products in the same family as Cassandra and DynamoDB >>>>> and >>>>> it would be good to think through the math so we can approach them all >>>>> in a similar way. >>>>> >>>>> -- >>>>> Paul Houle >>>>> paul.ho...@ontology2.com >>>>> >>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote: >>>>> >>>>> Yep, >>>>>> >>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/ >>>>>> >>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf >>>>> >>>>> >>>>>> indicates that they are indexing by subject. As someone who has >>>>>> implemented LDP, that is definitely the approach that makes sense >>>>>> there. >>>>>> >>>>>> --- >>>>>> A. Soroka >>>>>> The University of Virginia Library >>>>>> >>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <a...@apache.org> wrote: >>>>>> >>>>>>> >>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to >>>>>>> >>>>>>> Rya. Better for LDP (??). >>>>>> >>>>> >>>>> >>>>>> Andy >>>>>>> >>>>>>> On 17/10/16 15:41, A. Soroka wrote: >>>>>>> >>>>>>> There's also: >>>>>>>> >>>>>>>> https://github.com/cumulusrdf/cumulusrdf >>>>>>>> >>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of >>>>>>>> >>>>>>>> particular uses it expects to support. >>>>>>> >>>>>> >>>>> >>>>>> --- >>>>>>>> A. Soroka >>>>>>>> The University of Virginia Library >>>>>>>> >>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <a...@apache.org> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi Claude, >>>>>>>>> >>>>>>>>> There is certainly interest from me. >>>>>>>>> >>>>>>>>> What the best thing to do depends on various factors. By putting >>>>>>>>> it >>>>>>>>> >>>>>>>>> in extras I presume you mean it gets added to the release? That is >>>>>>>> >>>>>>> not the >>>>> only way forward. >>>>> >>>>> >>>>>> An important aspect of Apache is "Community over code" - will there >>>>>>>>> >>>>>>>>> be a community around this code? Is that community the same, or >>>>>>>> >>>>>>> significant overlap, as the Jena community? >>>>> >>>>> >>>>>> There are various reasons for wanting RDF over a column store - >>>>>>>>> >>>>>>>>> which use cases are the most important for this work? >>>>>>>> >>>>>>> >>>>> >>>>>> They lead to different ways of using Cassandra. For example, >>>>>>>>> >>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans >>>>>>>> of >>>>>>>> >>>>>>> the >>>>> table is streaming. Other systems try to use the columns for >>>>> properties, >>>>> possibly more useful for LDP style than SPARQL. >>>>> >>>>> >>>>>> Andy >>>>>>>>> >>>>>>>>> On 15/10/16 18:38, Claude Warren wrote: >>>>>>>>> >>>>>>>>> Howdy, >>>>>>>>>> >>>>>>>>>> We have a project at work that is implementing Jena Graph on >>>>>>>>>> >>>>>>>>>> Cassandra. I >>>>>>>>> >>>>>>>> >>>>> am wondering if there is enough interest here to accept it as a >>>>>> >>>>>>> contribution. I was thinking that it might fit in the Extras >>>>>>>>>> >>>>>>>>>> category. >>>>>>>>> >>>>>>>> >>>>> >>>>>> I can not promise release of the code yet as I have to present it >>>>>>>>>> >>>>>>>>>> to our >>>>>>>>> >>>>>>>> >>>>> internal Intellectual Property group first. >>>>>> >>>>>>> >>>>>>>>>> Thoughts? >>>>>>>>>> >>>>>>>>>> Claude >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >> >> -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren