[ 
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230073#comment-14230073
 ] 

Jarek Jarcec Cecho commented on SQOOP-1811:
-------------------------------------------

The idea behind {{IntermediateDataFormat}} is to allow connector to allow 
arbitrary format for the data. That is abstract concept and Sqoop doesn't 
impose any restrictions on how the connector should represent the data 
internally. The idea is that the "internal" format will be whatever is native 
for given connector.  JDBC based connectors will likely use CSV or Object 
array, fast connectors will most likely end up with CSV. More advanced 
connectors might use more advance structures. We have abstract template methods 
{{getData()}} and {{setData()}}, so that we can do some basic work with this 
abstract structure.

However in order to interpret the data we need the IDF to expose methods that 
will convert the internal format to something that is agreed upon between the 
IDF and rest of Sqoop code base, so that instead of one abstract object, we can 
get data that we can interpret in term of columns and their values. We are 
requesting IDF to expose that via two different method families. First is an 
Object representation via methods {{getObjectData()}} and {{setObjectData()}} 
that is expected to return Java objects corresponding to the column values. 
Second way is a CSV-ish representation that follows very strict formatting 
rules. I've tried to explain why we are requesting both formats in my [earlier 
comment|https://issues.apache.org/jira/browse/SQOOP-1811?focusedCommentId=14226919&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14226919].

As a result the text format in IDF is intentionally required, we are requesting 
IDF to expose us the text format and we are having the methods defined as 
abstract so that IDF will convert the internal structure into text lazily (only 
if we need the text representation). The same way the {{getObjectData()}} and 
{{setObjectData()}} works. I'm wondering what different goals are we trying to 
achieve here?


> IDF API changes
> ---------------
>
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is 
> obvious that we want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and 
> made final, so there is no way to override this and we can enforce all to 
> return String instead of generic T
> {code}
> // hold the string in IDF base class
>  private final String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing 
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new 
> IDF implementation T can be a custom object that could encapsulate the whole 
> row.
> Third, getData and setData can have custom implementation so they can be 
> overriden to return the generic type T



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to