[
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230073#comment-14230073
]
Jarek Jarcec Cecho commented on SQOOP-1811:
-------------------------------------------
The idea behind {{IntermediateDataFormat}} is to allow connector to allow
arbitrary format for the data. That is abstract concept and Sqoop doesn't
impose any restrictions on how the connector should represent the data
internally. The idea is that the "internal" format will be whatever is native
for given connector. JDBC based connectors will likely use CSV or Object
array, fast connectors will most likely end up with CSV. More advanced
connectors might use more advance structures. We have abstract template methods
{{getData()}} and {{setData()}}, so that we can do some basic work with this
abstract structure.
However in order to interpret the data we need the IDF to expose methods that
will convert the internal format to something that is agreed upon between the
IDF and rest of Sqoop code base, so that instead of one abstract object, we can
get data that we can interpret in term of columns and their values. We are
requesting IDF to expose that via two different method families. First is an
Object representation via methods {{getObjectData()}} and {{setObjectData()}}
that is expected to return Java objects corresponding to the column values.
Second way is a CSV-ish representation that follows very strict formatting
rules. I've tried to explain why we are requesting both formats in my [earlier
comment|https://issues.apache.org/jira/browse/SQOOP-1811?focusedCommentId=14226919&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14226919].
As a result the text format in IDF is intentionally required, we are requesting
IDF to expose us the text format and we are having the methods defined as
abstract so that IDF will convert the internal structure into text lazily (only
if we need the text representation). The same way the {{getObjectData()}} and
{{setObjectData()}} works. I'm wondering what different goals are we trying to
achieve here?
> IDF API changes
> ---------------
>
> Key: SQOOP-1811
> URL: https://issues.apache.org/jira/browse/SQOOP-1811
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2. Make the getTextData final and call it getCSV and setCSV, so it is
> obvious that we want to enforce CSV format
> the following code can move to the base class IntermediateDataFormat and
> made final, so there is no way to override this and we can enforce all to
> return String instead of generic T
> {code}
> // hold the string in IDF base class
> private final String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new
> IDF implementation T can be a custom object that could encapsulate the whole
> row.
> Third, getData and setData can have custom implementation so they can be
> overriden to return the generic type T
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)