[
https://issues.apache.org/jira/browse/SQOOP-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259152#comment-14259152
]
Veena Basavaraj edited comment on SQOOP-1901 at 12/26/14 5:27 PM:
------------------------------------------------------------------
Summary from the review board discussions.
In case of CSVIDF, data is the CSV string and hence we proactively construct it
when object array is set, for instance setObjectData(..) proactively constructs
the csvText ( represented by data) and everything else in LAZY, i.e in case of
CSVIDF, object array is constructed lazily from the csvText (i.e data) when
getObjectData() is called, we do not store the object array.
In case of JSONIDF, data is JSONObject ( or any other IDF say AvroIDF, data is
the avro object), so we store in memory only the JSON/ avro object and lazily
construct both csv and object array
Here are the high level details on what each method in the JSON IDF will do.
1. Store the source of truth in data i.e JSONObject
2. When setData is called, store the JSONObject in data and nothing else,
everything is lazy
3. When setObjectData is called, construct the JSONObject from object array, do
not store any CSVText nor objectArray
4. When setCsvText is called, also construct JSONObject from it, do not store
any CSVText nor objectArray
5. When getObjectData is called, convert from JSONObject to objectArray, so
this on demand, or rather called lazy, since we are not sure if these methods
will be called in all cases
6. When getCSVText is called, convert from JSONObject to CSVText, same as above
7. When getData is called, return the JSONObject
was (Author: vybs):
Summary from the review board discussions.
In case of CSVIDF, data is the CSV string and hence we proactively construct it
when object array is set, for instance setObjectData(..) proactively constructs
the csvText ( represented by data) and everything else in LAZY, i.e in case of
CSVIDF, object array is constructed lazily from the csvText (i.e data) when
getObjectData() is called, we do not store the object array.
In case of JSONIDF, data is JSONObject ( or any other IDF say AvroIDF, data is
the avro object), so we store in memory only the JSON/ avro object and lazily
construct both csv and object array
Here are the high level details on what each method in the JSON IDF will do.
Store the source of truth in data i.e JSONObject
When setData is called, store the JSONObject in data and nothing else,
everything is lazy
When setObjectData is called, construct the JSONObject from object array, do
not store any CSVText nor objectArray
When setCsvText is called, also construct JSONObject from it, do not store any
CSVText nor objectArray
when getObjectData is called, convert from JSONObject to objectArray, so this
on demand, or rather called lazy, since we are not sure if these methods will
be called in all cases
When getCSVText is called, convert from JSONObject to CSVText, same as above
When getData is called, return the JSONObject
> Supporting DRY code in new IDF impementation JSONIDF
> ----------------------------------------------------
>
> Key: SQOOP-1901
> URL: https://issues.apache.org/jira/browse/SQOOP-1901
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
> Attachments: SQOOP-1901-v2.patch
>
>
> As the title suggests, we want to encourage DRY code in the new IDF
> implementations.
> As the IDF api mandates CSV and object format for all its sub implementation,
> I propose we move the common functionality to the base IDF class so that JSON
> IDF or AvroIDF does not have to repeat this code.
> The only parts of the code that needs to be in subclasses is how then handle
> the conversion between the "T" ( generic parameter) and the csv/ object
> representations.
> I saw that http://ingest.tips/2014/12/11/sqoop-1-99-4-release/ mentions
> extensind from CSVIDF and this cannot technically work since we have the
> generic T that will be different for AvroIDF or JSON IDF
> Update:
> Also extending from CSVIDF seems a bit ilogical, since the IDF API says that
> it needs CSV and object Array, these functionality of converting between the
> two i.e text to object and object to text should be in base class.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)