[ 
https://issues.apache.org/jira/browse/JENA-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642341#comment-15642341
 ] 

Andy Seaborne commented on JENA-1233:
-------------------------------------

A plan:

Java serialization has a mechanism to allow an object to be serialized by 
another object. This plan uses that to put the serialization code into ARQ.

https://docs.oracle.com/javase/8/docs/platform/serialization/spec/serialTOC.html

We need to decouple {{Node}} and {{Triple}} serialization otherwise we limit 
the possible serialization implementation to what is available to jena-core.

In order to make {{Node}} and {{Triple}} serializable, use {{writeReplace}} and 
{{readResolve}} to produce a serializable wrapper. These work by inserting a 
different object into the serialization stream.  The 
{{SNode}}/{{STriple}}/{{SQuad}} in the explicit wrapper.

{{Node}}/{{Triple}}/{{Quad}} have a function called in {{writeReplace}} 
injected so the serialization is not fixed.  The binary form using Thrift will 
be injected by ARQ when Jena initializes.

Sketch (a better injection mechanism is needed to avoid cluttering the API of 
{{Node}}):
{noformat}
public abstract class Node implements Serializable {
    // Returned Object must provide "readResolve()" that returns a Node. 
    public static Function<Node, Object> replacement = null ;
    protected Object writeReplace() throws ObjectStreamException {
        return replacement.apply(this);
    }
...    
{noformat}
NB No "serialVersionUID" here - it is given in the replacement Object so 
different serializations will not get mixed up.

This means that jena-core, on its own without ARQ, does not support 
serialization of {{Node}} and {{Triple}}. As running jena-core without jena-arq 
is to be discouraged anyway, that is no bad thing. A string based form could be 
provided, but not supporting quad.

Alt plan: have two injected functions for {{writeObject}} and {{readObject}}, 
then the serialVersionUID comes from {{Node}} and is the same for all 
implementations.

I don't see sufficient advantage of this. It looks more like a "normal" 
implementation, rather than the {{writeReplace}}/{{readResolve}} dance, and 
does not create the short lived, intermediate object but with the impact of 
same serialVersionUID across implementations.


> Make RDF primitives Serializable
> --------------------------------
>
>                 Key: JENA-1233
>                 URL: https://issues.apache.org/jira/browse/JENA-1233
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Elephas
>    Affects Versions: Jena 3.1.0
>            Reporter: Itsuki Toyota
>
> I always use Jena when I handle RDF data with Apache Spark.
> However, when I want to store resulting RDD data (ex. RDD[Triple]) in binary 
> format, I can't call RDD.saveAsObjectFile method.
> It's because RDD.saveAsObjectFile requires java.io.Serializable interface.
> See the following code. 
> https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1469
> https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L79-L86
> You can see that 
> 1) RDD.saveAsObjectFile calls Util.serialize method
> 2) Util.serialize method requires the RDD-wrapped object implementing 
> java.io.Serializable interface. For example, if you want to save a 
> RDD[Triple] object, Triple must implements java.io.Serializable.
> So why not implement java.io.Serializable ?
> I think it will improve the usability in Apache Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to