[ 
https://issues.apache.org/jira/browse/SQOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246252#comment-14246252
 ] 

Jarek Jarcec Cecho commented on SQOOP-1900:
-------------------------------------------

If I'm reading the {{write(DataOutput)}} and {{read(DataInput)}} method 
implementation correctly, then for let say Avro based IDF we would do the 
following: On one machine we would have Avro objects in memory, the call 
{{getCSVTextData()}} would serialize the objects into String that could be 
transferred to a different machine in the cluster and finally the text 
representation would be deserialized back into Avro object in the 
{{setCSVTextData()}} call.

I believe that the intention of having the same methods in IDF was, so that the 
IDF itself can do the serialization of the native format and we don't have to 
do any extra serialization/deserialization step. E.g that {{write(DataOutput 
out)}} would be implemented simply as {{toIDF.write(out)}}. That would simplify 
the scenario above to have Avro objects in memory, serialize them into bytes 
the most efficient way for Avro, transfer them across the wire and then 
directly deserialize them back into Avro objects. 

I'm not familiar with Spark, but I'm assuming that there have to be  a similar 
API that is serializing/deserializing data for wire transfer?

> IDF API read/ write method 
> ---------------------------
>
>                 Key: SQOOP-1900
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1900
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> At this point I am not clear what the real use of the following 2 methods are 
> in the IDF API. Can anyone explain? I have not seen it used anywhere in the 
> code I might be missing something
> {code}
>   /**
>    * Serialize the fields of this object to <code>out</code>.
>    *
>    * @param out <code>DataOuput</code> to serialize this object into.
>    * @throws IOException
>    */
>   public abstract void write(DataOutput out) throws IOException;
>   /**
>    * Deserialize the fields of this object from <code>in</code>.
>    *
>    * <p>For efficiency, implementations should attempt to re-use storage in 
> the
>    * existing object where possible.</p>
>    *
>    * @param in <code>DataInput</code> to deseriablize this object from.
>    * @throws IOException
>    */
>   public abstract void read(DataInput in) throws IOException;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to