Re: Review Request 47108: Proposed changes for SQOOP-2920

Ruslan Dautkhanov Thu, 12 May 2016 21:15:07 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47108/#review133028
-----------------------------------------------------------




We tested this patch on a sample of our production data (21.4m rows, 716 
columns).

* Before patch:
16/05/12 20:39:36 INFO mapreduce.ExportJobBase: Transferred 8.4101 GB in 
4,953.4655 seconds (1.7386 MB/sec)
16/05/12 20:39:36 INFO mapreduce.ExportJobBase: Exported 21399476 records.
                GC time elapsed (ms)=1745751
                CPU time spent (ms)=238899370
                Physical memory (bytes) snapshot=240646844416
                Virtual memory (bytes) snapshot=491522174976
                Total committed heap usage (bytes)=204771688448

* After patch:
16/05/12 18:17:36 INFO mapreduce.ExportJobBase: Transferred 8.4101 GB in 
744.7664 seconds (11.5633 MB/sec)
16/05/12 18:17:36 INFO mapreduce.ExportJobBase: Exported 21399476 records.
        GC time elapsed (ms)=1640876
                CPU time spent (ms)=59953350
                Physical memory (bytes) snapshot=319115075584
                Virtual memory (bytes) snapshot=486723493888
                Total committed heap usage (bytes)=281407389696

That's a great improvement.

- Ruslan Dautkhanov


On May 9, 2016, 7:14 a.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47108/
> -----------------------------------------------------------
> 
> (Updated May 9, 2016, 7:14 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> With the current implementation of ClassWriter the generated table ORM 
> classes contains a setField which is built around long if statemetns (having 
> a single branch for every private field). Altough this concept works 
> perfectly for small/midsize (regarding to the number of columns) tables, in 
> case of wide ones (>>500 column) it causes a relevant performance degradation 
> (and thus making export much slower than should be, as seen in the JIRA 
> task). Attached I provide a proposed solution to avoid it. According to my 
> own measurements this solution is 250x faster than the current one. (Tested 
> with 800 field wide table ORMs 20000,100000,1m,5m rows).
> 
> Please review it and share your thoughts!
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 23a9c41 
>   src/java/org/apache/sqoop/orm/CompilationManager.java ce165e8 
>   src/test/com/cloudera/sqoop/orm/TestClassWriter.java 498db73 
> 
> Diff: https://reviews.apache.org/r/47108/diff/
> 
> 
> Testing
> -------
> 
> The current unit testcase has been only extended with one test method which 
> simulates the "insertion" of 20000 rows (calling all the 800 setters 20000 
> times with random values), but I've also tested with 100000,1m,5m rows on my 
> local environment. It showed this solution is at least 250x faster.
> 
> Any additional idea for testing is more than welcome from the community.
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>

Re: Review Request 47108: Proposed changes for SQOOP-2920

Reply via email to