Re: Review Request 29025: JobManager currently ignores the TO connector IDF.

Veena Basavaraj Sun, 14 Dec 2014 11:36:48 -0800


> On Dec. 14, 2014, 9:06 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java,
> >  lines 70-72
> > <https://reviews.apache.org/r/29025/diff/3/?file=791537#file791537line70>
> >
> >     I'm wondering what is motivation for this change? It seems to me that 
> > using T is much better then generic object.
> 
> Veena Basavaraj wrote:
>     is not it obvious? that the actual object can be anything other than 
> string
> 
> Veena Basavaraj wrote:
>     may be it is not obvious. Let me shed some more light. :)
>     
>     In
>     
>       private class SqoopMapDataWriter extends DataWriter {
>     
>     we need to do
>     
>       @Override
>         public void writeRecord(Object obj) {
>           fromIDF.setData(obj);
>           writeContent();
>         }
>     
>     instead of the current do
>       @Override
>         public void writeRecord(Object obj) {
>           fromIDF.setData(obj.toString);
>           writeContent();
>         }
> 
> Jarek Cecho wrote:
>     I agree that the writeRecord(Object) should be calling setData(obj) 
> rather then doing the string conversion. So if I understand it correctly, 
> this change is then not needed correct?


the change exists because we need to make this work, how can T be recognised? 
Try the code yourself out, apply the patch and suggest if it a better way here


> On Dec. 14, 2014, 9:06 a.m., Jarek Cecho wrote:
> > execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java,
> >  line 34
> > <https://reviews.apache.org/r/29025/diff/3/?file=791543#file791543line34>
> >
> >     I'm wondering here. The SqoopWritable can (and probably will) be used 
> > in both "From" and "To" context rigth? So we should probably load the 
> > correct IDF and not assume that it's "to".
> 
> Veena Basavaraj wrote:
>     when it is we can get there, at this point it is used as the writable for 
> the loader and I rather make it clear that it is then forward thinking for 
> reasons it might not even be relevant now
> 
> Veena Basavaraj wrote:
>     I even thought of renaming it to the ToWritable, but I know some people 
> on this project consider renames as the only think I do, so I wanted to spare 
> that discussion in the RB
> 
> Jarek Cecho wrote:
>     I believe that currently the same writable is used for both From and To 
> and hence my comment. E.g extrated rows will get populated on mapper side 
> with "From" IDF, right? It seems to me as separating the IDFs for from/to is 
> actually slightly more complicated as we have to be careful what are each 
> parts of the workflow expecting. There will be only one Writable between 
> Mapper/Reducer(OutputFormat) and I would assume that the one writable will be 
> written with "From" context and will be read by "To" context. It seems to me 
> that we need to put somewhere the logic that do the conversion from "From" to 
> "To", so that we end up having only one writable with one IDF. Or are we 
> doing it already and I've just missed it?

5 line comment made me a bit dizzy. 

If you have not followed the code closely, I suggest looking thw the Sqoop 
Mapper and the DataWriter there, at first when I saw this code it left me dizzy 
too, so I have added comments in the mapper

    private void writeContent() {
      try {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Extracted data: " + fromIDF.getCSVTextData());
        }
        // NOTE: The fromIDF and the corresponding fromSchema is used only for 
the matching process
        // The output of the mappers is finally written to the toIDF object 
after the matching process
        // since the writable encapsulates the toIDF ==> new 
SqoopWritable(toIDF)
        toIDF.setObjectData(matcher.getMatchingData(fromIDF.getObjectData()));
        // NOTE: We do not use the reducer to do the writing (a.k.a LOAD in 
ETL). Hence the mapper sets up the writable
        context.write(writable, NullWritable.get());
      } catch (Exception e) {
        throw new SqoopException(MRExecutionError.MAPRED_EXEC_0013, e);
      }
    }
  }

The current code does not use this class for anything but the loader/ TO  part. 
So if you propose reusing this, please create a ticket for that.


> On Dec. 14, 2014, 9:06 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
> >  line 322
> > <https://reviews.apache.org/r/29025/diff/3/?file=791536#file791536line322>
> >
> >     I'm wondering what is motivation for this change? I'm assuming that we 
> > want to allow comparing to any existing IDF (albeit comparing differet IDFs 
> > will always throw an exception), so why to restrict the usage only for 
> > those who are using String as their internal object?
> 
> Veena Basavaraj wrote:
>     see below, there is strong typing for a reason.
> 
> Veena Basavaraj wrote:
>     even better would be to change this to the actual CSVInter....
> 
> Veena Basavaraj wrote:
>     and also why do we need the compareTo? cant the default work?
>     
>     
>       public int compareTo(CSVIntermediateDataFormat o) {
>         if (this == o) {
>           return 0;
>         }
>         if (this.equals(o)) {
>           return 0;
>         }
>         return data.compareTo(o.getCSVTextData());
>       }
> 
> Jarek Cecho wrote:
>     It seems that we are not using the compareTo() method anywhere in the 
> code, so both current way and compareTo(CSVIntermediateDataFormat o) works 
> for me. The middle way with IntermediateDataFormat<String> seems strange 
> though.

I suggest removing dead code and it seems superflous when thre is no extra 
logic in there


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29025/#review65049
-----------------------------------------------------------


On Dec. 14, 2014, 10:42 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29025/
> -----------------------------------------------------------
> 
> (Updated Dec. 14, 2014, 10:42 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1882
>     https://issues.apache.org/jira/browse/SQOOP-1882
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> It fixes  few things 
> 
> JobManager currently ignores the TO connector IDF.
> Sqoop Mapper IDF  cannot assume the type to be String
> Using Set for jars than list, so we can avoid duplicates
> 
> 
> Diffs
> -----
> 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
>  dbe193d 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java
>  93698a8 
>   
> connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java
>  83a95ec 
>   core/src/main/java/org/apache/sqoop/driver/JobManager.java 01073d4 
>   core/src/main/java/org/apache/sqoop/driver/JobRequest.java 8c1cc95 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
>  9b3eb44 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/MRJobConstants.java 
> 67021a8 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java 
> 336ab97 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 
> 7434243 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
>  d664337 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 
> 256c34d 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMatching.java 
> 67c8525 
>   
> execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
>  d897125 
> 
> Diff: https://reviews.apache.org/r/29025/diff/
> 
> 
> Testing
> -------
> 
> yes
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>

Re: Review Request 29025: JobManager currently ignores the TO connector IDF.

Reply via email to