----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10795/#review19770 -----------------------------------------------------------
Ship it! Ship It! - Ashutosh Chauhan On April 26, 2013, 11:25 a.m., Gopal V wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10795/ > ----------------------------------------------------------- > > (Updated April 26, 2013, 11:25 a.m.) > > > Review request for hive, Ashutosh Chauhan and Gunther Hagleitner. > > > Description > ------- > > Speed up RCFile::sync() by reading large blocks of data from HDFS rather than > using readByte() on the input stream. > > This improves the loop behaviour and reduces the number of calls on the > synchronized read() methods within HDFS, resulting in a 10x performance boost > to this function. > > In real time, it converts a call that takes upto a second and brings it below > 100ms, by reading 512 byte chunks instead of reading data 1 byte at a time. > > > This addresses bug HIVE-4423. > https://issues.apache.org/jira/browse/HIVE-4423 > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0 > > Diff: https://reviews.apache.org/r/10795/diff/ > > > Testing > ------- > > ant test -Dtestcase=TestRCFile -Dmodule=ql > ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql > > And benchmarking with count(1) on the store_sales rcfile table at scale=10 > > before: 43.8, after: 39.5 > > > Thanks, > > Gopal V > >