[
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271522#comment-14271522
]
Veena Basavaraj commented on SQOOP-1811:
----------------------------------------
See my comments in the IDF wiki,
https://cwiki.apache.org/confluence/display/SQOOP/Intermediate+Data+Format+API#IntermediateDataFormatAPI-FoodforThought.?,
which are more like shortcomings of the current design.
even though we say CSV IDF, there is no way to infer that the format used for
transfer is text. so a connector on TO can use object array or the connector on
FROM can use object array and completely result in bad performance.
Something else to note, I created a ticket yesterday
https://issues.apache.org/jira/browse/SQOOP-1989
even though you say use text, the matching code using getObjectData and
setObjectData, so this means there is conversion no matter what you say.
Lastly, I would prefer not to add these options on command line, but make it a
config that a connector exposes on the FROM side to force only text, this
means we dont create a explosion of these options like no validate that do not
make sense to majority of the connectors but are only relevant to fast-dump
connectors
Also, open a new ticket, this ticket was closed and it is not going to help
continuing a conversation here if we want to propose a change.
IMO, the real way to optimize for fast dump connectors is to have a config to
choose only text on both from and to and let the user explicitly configure it
while creating the job, If you read the design proposal of SQOOP-1168, it is
along the same lines, for telling delta upates
> Sqoop2: IDF API changes
> -----------------------
>
> Key: SQOOP-1811
> URL: https://issues.apache.org/jira/browse/SQOOP-1811
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
> Attachments: SQOOP-1811.patch
>
>
> 1. update the java docs for IDF apis.
> 2. Make the getTextData final and call it getCSV and setCSV, so it is
> obvious that we want to enforce CSV format
> the following code can move to the base class IntermediateDataFormat and
> made final, so there is no way to override this and we can enforce all to
> return String instead of generic T
> {code}
> // hold the string in IDF base class
> private final String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new
> IDF implementation T can be a custom object that could encapsulate the whole
> row.
> Third, getData and setData can have custom implementation so they can be
> overriden to return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
> private String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)