[ 
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271522#comment-14271522
 ] 

Veena Basavaraj commented on SQOOP-1811:
----------------------------------------

See my comments in the IDF wiki, 
https://cwiki.apache.org/confluence/display/SQOOP/Intermediate+Data+Format+API#IntermediateDataFormatAPI-FoodforThought.?,
 which are more like shortcomings of the current design.

even though we say CSV IDF, there is no way to infer that the format used for 
transfer is text. so a connector on TO can use object array or the connector on 
FROM can use object array and completely result in bad performance.

Something else to note, I created a ticket yesterday 
https://issues.apache.org/jira/browse/SQOOP-1989

even though you say use text, the matching code using getObjectData and 
setObjectData, so this means there is conversion no matter what you say.

Lastly, I would prefer not to add these options on command line, but make it a 
config that a connector exposes on the FROM side to force only text, this  
means we dont create a explosion of these options like no validate that do not 
make sense to majority of the connectors but are only relevant to fast-dump 
connectors


Also, open a new ticket, this ticket was closed and it is not going to help 
continuing a conversation here if we want to propose a change.


IMO, the real way to optimize for fast dump connectors is to have a config to 
choose only text on both from and to and let the user explicitly configure it 
while creating the job, If you read the design proposal of SQOOP-1168, it is 
along the same lines, for telling delta upates

> Sqoop2: IDF API changes
> -----------------------
>
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>         Attachments: SQOOP-1811.patch
>
>
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is 
> obvious that we want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and 
> made final, so there is no way to override this and we can enforce all to 
> return String instead of generic T
> {code}
> // hold the string in IDF base class
>  private final String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing 
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new 
> IDF implementation T can be a custom object that could encapsulate the whole 
> row.
> Third, getData and setData can have custom implementation so they can be 
> overriden to return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
>  private String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to