[ 
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229162#comment-14229162
 ] 

Jarek Jarcec Cecho commented on SQOOP-1811:
-------------------------------------------

I see, let me reiterate the proposal using my own words to ensure that I'm on 
the same page. The proposal is that the parent {{IntermediateDataFormat}} class 
should hold the {{text}} variable that will hold the Sqoop CSV-ish format of 
given row. This variable will be private to the parent class and will be only 
accessible to outside world and children implementations via public 
{{setCSVTextData()}} and {{getCSVTextData()}} methods. If my understanding is 
correct, then I do have couple of concerns:

* Defining the variable {{text}} as final means that we will need to 
instantiate new IDF class for every transferred row whereas today we are using 
one instance for entire extractor/loader instance.
* When using methods {{setData()}} and {{setObjectData()}} the IDF 
implementation is responsible to convert given data into CSV-ish text and call 
method {{setSqoopCSVString()}}. This will mean that we will always convert the 
data into CSV-ish text, regardless whether we need that or not.
* When calling method {{getData()}} and {{getObjectData()}} the IDF 
implementation is responsible to check whether the {{text}} is in sync with the 
internal representation because otherwise we might end up with data corruption. 
E.g. calling {{setSqoopCSVString()}} won't alter internal representation 
accordingly and therefore call to {{getData()}} and {{getObjectData()}} has to 
be protected.

Let me know if that make any sense.

> IDF API changes
> ---------------
>
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is 
> obvious that we want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and 
> made final, so there is no way to override this and we can enforce all to 
> return String instead of generic T
> {code}
> // hold the string in IDF base class
>  private final String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing 
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new 
> IDF implementation T can be a custom object that could encapsulate the whole 
> row.
> Third, getData and setData can have custom implementation so they can be 
> overriden to return the generic type T



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to