[
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230198#comment-14230198
]
Jarek Jarcec Cecho commented on SQOOP-1811:
-------------------------------------------
My apologies for the confusion [~vybs]. The IDF implementation is required to
provide method {{getSqoopCSVData()}} and {{setSqoopCSVData()}} to work with the
CSV-ish format. However Sqoop don't have to call this method in every case and
hence the conversion should be "lazy" (performed on demand). E.g. there might
be scenario where Sqoop when working with Connector 1 and 2 decide to use the
{{getData()}} and {{setData()}} methods (both connectors are using the same IDF
implementation and we are not required to do any conversions) and at the same
time for Connector 1 and 3 decides to use {{getSqoopCSVData()}} and
{{setSqoopCSVData()}} (because text happens to be the best format to move data
around). The first example should impose no CPU penalty for converting into
CSV-ish format as in that case the CSV-ish format is not used.
{quote}
1. We have agreed to rename the API to getSqoopCSVData() and setSqoopCSVData().
I propose making this final and moving the text field to the base class (IDF).
My understanding was that every IDF implementation will have to provide this
string.
{quote}
I think that the suggestion to rename {{getTextData()}} and {{setTextData()}}
to {{getSqoopCSVData()}} and {{setSqoopCSVData()}} is a good idea and I'm
supporting that. I still don't see a value in defining either of those methods
as final and/or require to have {{text}} element in the
{{IntermediateDataFormat}} base class.
{quote}
2. getData() and getObjectData() even though I am not sure why both will be
needed in all cases. So in case of AvroIDF why would we need both.
getData() and setData() would set a Avro object that represents the row
{quote}
I don't think that all methods will be used during all data transfers. That is
also why we are defining those methods as abstract and why the implementation
should be "lazy". I believe that your example is correct, to sum it up, for
AvroIDF:
* {{getData()}} will return Avro object representing the row
* {{getObjectData()}} will convert the internal Avro object to return "plain"
Java objects
* {{getSqoopCSVData()}} will convert the internal Avro object to return CSV-ish
text
> IDF API changes
> ---------------
>
> Key: SQOOP-1811
> URL: https://issues.apache.org/jira/browse/SQOOP-1811
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2. Make the getTextData final and call it getCSV and setCSV, so it is
> obvious that we want to enforce CSV format
> the following code can move to the base class IntermediateDataFormat and
> made final, so there is no way to override this and we can enforce all to
> return String instead of generic T
> {code}
> // hold the string in IDF base class
> private final String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new
> IDF implementation T can be a custom object that could encapsulate the whole
> row.
> Third, getData and setData can have custom implementation so they can be
> overriden to return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
> private String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)