-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47108/#review133028
-----------------------------------------------------------
We tested this patch on a sample of our production data (21.4m rows, 716
columns).
* Before patch:
16/05/12 20:39:36 INFO mapreduce.ExportJobBase: Transferred 8.4101 GB in
4,953.4655 seconds (1.7386 MB/sec)
16/05/12 20:39:36 INFO mapreduce.ExportJobBase: Exported 21399476 records.
GC time elapsed (ms)=1745751
CPU time spent (ms)=238899370
Physical memory (bytes) snapshot=240646844416
Virtual memory (bytes) snapshot=491522174976
Total committed heap usage (bytes)=204771688448
* After patch:
16/05/12 18:17:36 INFO mapreduce.ExportJobBase: Transferred 8.4101 GB in
744.7664 seconds (11.5633 MB/sec)
16/05/12 18:17:36 INFO mapreduce.ExportJobBase: Exported 21399476 records.
GC time elapsed (ms)=1640876
CPU time spent (ms)=59953350
Physical memory (bytes) snapshot=319115075584
Virtual memory (bytes) snapshot=486723493888
Total committed heap usage (bytes)=281407389696
That's a great improvement.
- Ruslan Dautkhanov
On May 9, 2016, 7:14 a.m., Attila Szabo wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47108/
> -----------------------------------------------------------
>
> (Updated May 9, 2016, 7:14 a.m.)
>
>
> Review request for Sqoop.
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> With the current implementation of ClassWriter the generated table ORM
> classes contains a setField which is built around long if statemetns (having
> a single branch for every private field). Altough this concept works
> perfectly for small/midsize (regarding to the number of columns) tables, in
> case of wide ones (>>500 column) it causes a relevant performance degradation
> (and thus making export much slower than should be, as seen in the JIRA
> task). Attached I provide a proposed solution to avoid it. According to my
> own measurements this solution is 250x faster than the current one. (Tested
> with 800 field wide table ORMs 20000,100000,1m,5m rows).
>
> Please review it and share your thoughts!
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/orm/ClassWriter.java 23a9c41
> src/java/org/apache/sqoop/orm/CompilationManager.java ce165e8
> src/test/com/cloudera/sqoop/orm/TestClassWriter.java 498db73
>
> Diff: https://reviews.apache.org/r/47108/diff/
>
>
> Testing
> -------
>
> The current unit testcase has been only extended with one test method which
> simulates the "insertion" of 20000 rows (calling all the 800 setters 20000
> times with random values), but I've also tested with 100000,1m,5m rows on my
> local environment. It showed this solution is at least 250x faster.
>
> Any additional idea for testing is more than welcome from the community.
>
>
> Thanks,
>
> Attila Szabo
>
>