-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10795/#review19770
-----------------------------------------------------------

Ship it!


Ship It!

- Ashutosh Chauhan


On April 26, 2013, 11:25 a.m., Gopal V wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10795/
> -----------------------------------------------------------
> 
> (Updated April 26, 2013, 11:25 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.
> 
> 
> Description
> -------
> 
> Speed up RCFile::sync() by reading large blocks of data from HDFS rather than 
> using readByte() on the input stream. 
> 
> This improves the loop behaviour and reduces the number of calls on the 
> synchronized read() methods within HDFS, resulting in a 10x performance boost 
> to this function.
> 
> In real time, it converts a call that takes upto a second and brings it below 
> 100ms, by reading 512 byte chunks instead of reading data 1 byte at a time.
> 
> 
> This addresses bug HIVE-4423.
>     https://issues.apache.org/jira/browse/HIVE-4423
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0 
> 
> Diff: https://reviews.apache.org/r/10795/diff/
> 
> 
> Testing
> -------
> 
> ant test -Dtestcase=TestRCFile -Dmodule=ql
> ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql
> 
> And benchmarking with count(1) on the store_sales rcfile table at scale=10
> 
> before: 43.8, after: 39.5 
> 
> 
> Thanks,
> 
> Gopal V
> 
>

Reply via email to