[
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228795#comment-14228795
]
Veena Basavaraj commented on SQOOP-1811:
----------------------------------------
+ 1 on adding a link to the doc, I am pending to finish the doc until the
actual CSV implementation is completed.
Another suggestion, We can also make it SqoopCSV, and add a comment to use the
SqoopCSVUtils as helpers.
Second part, yes the IDF will have custom implementation, A naive example that
I use as an example, is JSON, if we have a JSONIntermediateDataFormat, where
the entire row will be one JSON object, one way of implementing the
getObjectData() will return an array item of size 1, the getData will return
the JSON object( in this case T is just a JSONObject ), and getCSVText, will
still have to return the sqoop dictated CSV string.
But in this case as well,
Third, after spending 2 weeks in 1350 sub tickets, I totally understand what
the text format is trying to solve, but I will be surprised to hear that text
is preferred. Avro and parquet are much more prominent. Anyways, without any
concrete data on the usage, there is point in debating and as always, there is
room to add a AvroIDF/ Parquet IDF and connectors can use setData, getData
directly in that case. This also makes me think, that a connector exposing one
IDF is restricting, it should be able to work with more than one, like a list
of supported IDFs ( in the SqoopConnector API), but that is different story for
now.
> IDF API changes
> ---------------
>
> Key: SQOOP-1811
> URL: https://issues.apache.org/jira/browse/SQOOP-1811
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2. Make the getTextData final and call it getCSV and setCSV, so it is
> obvious that we want to enforce CSV format
> the following code can move to the base class IntermediateDataFormat and
> made final, so there is no way to override this and we can enforce all to
> return String instead of generic T
> {code}
> // hold the string in IDF base class
> private final String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new
> IDF implementation T can be a custom object that could encapsulate the whole
> row.
> Third, getData and setData can have custom implementation so they can be
> overriden to return the generic type T
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)