Re: Sketching for jena3

Andy Seaborne Sat, 07 Jun 2014 10:25:07 -0700

Claude,

The general idea of support for serialization makes a lot of sense,rdf-hadoop and DataBags.

The specifics of java serialization - not necessarily so. We might beforced into that for Java RMI if that's a goal but there are other RPCmechanisms including multi-language ones (Thrift).


When read/writing to storage, multi-language makes a lot of sense.

It might be better to have SNode implements java.io.Serializable that issimply wrapping a single Node then have that class provide all thewriteObject/readObject. This isolates RMI but with an extra indirection.

Serializable for Graph is a whole different discussion! Exchange may bestructures like tuples. At least with Graphs there are no cycle issuesbut graphs can be big. So I guess for me it's understanding what sortof RMI operations are the design target.

Serialization does require stable bnode labels - hence using 2 longs fora global ID (the label is more for the convenience of assigning thelabel for small scale and debugging uses).


On 07/06/14 14:13, Claude Warren wrote:

I would still like to see Node as a serializable object, or some standard
mechanism to get a serialized version of the node.  Any thoughts along this
path would be appreciated.

I had thought about something along the lines of a type byte and raw data
as a serialized form.  But that would mean that each type would have to
"register" so we could keep them from stepping on each other.  This I
realize is wholly unworkable.

I'm using protocol buffers [*] in Lizard. I will be going a RDF.proto(unless I find one; it's not hard) as I have to transmit Nodes althoughat the moment, for expedience, I'm sending strings around.


Side effect - having protocol buffer encoding for a TDB node table.

So I am back to thinking we should make the Node Serializable.

Basically, I want to be able to serialize the node out so I can store it
and deserialized it on demand, without having to worry about new and
strange Node types.  This will make a remote client (using connections
other than SPARQL, ala RMI) easier and will make the implementation of the
Graph SPI easier for some types of storage (e.g. Hadoop).

We can concentrate on the core node types from RDF 1.1 + named variables(not NodeExt or NodeSymbol or NodeGraph).

PS I'm having trouble seeing why interfaces for Triple and Quad make anysense. Thoughts? I'm already wondering what debugging will feel likeand whether doing it completely the other way round - one single überNode class for the usual suspects.


Claude


[*] Why protocol buffers and not Thrift/Avro/a.n.other?

Protocol buffers are integrated into netty so I don't have to do thatintegration. I'm using netty 5.0.0-alpha; netty+thrift is out of date.

So the choice is netty+PB vs thrift. The service layer in Thrift seemstoo RPC-ish - Lizard needs streams. I don't know enough to decide forsure. netty documentation is better. Switching between the two shouldbe possible as the details of PB+M aren't exposed. I'd like to do both.

On Fri, Jun 6, 2014 at 10:13 AM, Andy Seaborne <
[email protected]> wrote:

Just for discussion, here is a somewhat idealised form of Node:

https://svn.apache.org/repos/asf/jena/Experimental/jena3-sketch/

As before there is one "Node" for any RDF term + extras 9variables, graphs
as nodes of a graph, "extension") because triple and quads are
Node,Node,Node ... this layer does not reflect the current RDF restrictions
of literals to objects or graph names.

Feel free to mess with the code, or put a different design along side, or
sketch ideas for another area of Jena.  No sense of being "the design".

         Andy

And from a while ago:
http://mail-archives.apache.org/mod_mbox/jena-dev/201211.
mbox/%[email protected]%3E

Re: Sketching for jena3

Reply via email to