Claude,

The general idea of support for serialization makes a lot of sense, rdf-hadoop and DataBags.

The specifics of java serialization - not necessarily so. We might be forced into that for Java RMI if that's a goal but there are other RPC mechanisms including multi-language ones (Thrift).

When read/writing to storage, multi-language makes a lot of sense.

It might be better to have SNode implements java.io.Serializable that is simply wrapping a single Node then have that class provide all the writeObject/readObject. This isolates RMI but with an extra indirection.

Serializable for Graph is a whole different discussion! Exchange may be structures like tuples. At least with Graphs there are no cycle issues but graphs can be big. So I guess for me it's understanding what sort of RMI operations are the design target.

Serialization does require stable bnode labels - hence using 2 longs for a global ID (the label is more for the convenience of assigning the label for small scale and debugging uses).

On 07/06/14 14:13, Claude Warren wrote:
I would still like to see Node as a serializable object, or some standard
mechanism to get a serialized version of the node.  Any thoughts along this
path would be appreciated.

I had thought about something along the lines of a type byte and raw data
as a serialized form.  But that would mean that each type would have to
"register" so we could keep them from stepping on each other.  This I
realize is wholly unworkable.

I'm using protocol buffers [*] in Lizard. I will be going a RDF.proto (unless I find one; it's not hard) as I have to transmit Nodes although at the moment, for expedience, I'm sending strings around.

Side effect - having protocol buffer encoding for a TDB node table.

So I am back to thinking we should make the Node Serializable.

Basically, I want to be able to serialize the node out so I can store it
and deserialized it on demand, without having to worry about new and
strange Node types.  This will make a remote client (using connections
other than SPARQL, ala RMI) easier and will make the implementation of the
Graph SPI easier for some types of storage (e.g. Hadoop).

We can concentrate on the core node types from RDF 1.1 + named variables (not NodeExt or NodeSymbol or NodeGraph).

(
PS I'm having trouble seeing why interfaces for Triple and Quad make any sense. Thoughts? I'm already wondering what debugging will feel like and whether doing it completely the other way round - one single über Node class for the usual suspects.
)


Claude

[*] Why protocol buffers and not Thrift/Avro/a.n.other?

Protocol buffers are integrated into netty so I don't have to do that integration. I'm using netty 5.0.0-alpha; netty+thrift is out of date.

So the choice is netty+PB vs thrift. The service layer in Thrift seems too RPC-ish - Lizard needs streams. I don't know enough to decide for sure. netty documentation is better. Switching between the two should be possible as the details of PB+M aren't exposed. I'd like to do both.

On Fri, Jun 6, 2014 at 10:13 AM, Andy Seaborne <
[email protected]> wrote:

Just for discussion, here is a somewhat idealised form of Node:

https://svn.apache.org/repos/asf/jena/Experimental/jena3-sketch/

As before there is one "Node" for any RDF term + extras 9variables, graphs
as nodes of a graph, "extension") because triple and quads are
Node,Node,Node ... this layer does not reflect the current RDF restrictions
of literals to objects or graph names.

Feel free to mess with the code, or put a different design along side, or
sketch ideas for another area of Jena.  No sense of being "the design".

         Andy

And from a while ago:
http://mail-archives.apache.org/mod_mbox/jena-dev/201211.
mbox/%[email protected]%3E





Reply via email to