[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642745#comment-13642745 ]
Gopal V commented on HIVE-4423: ------------------------------- || split location || before || after || | store_sales/000000_0:67108864+67108864 | 748 ms | 81 ms | | store_sales/000002_0:67108864+67108864 | 966 ms | 54 ms | | store_sales/000004_0:67108864+67108864 | 948 ms | 51 ms | | store_sales/000006_0:67108864+67108864 | 922 ms | 42 ms | | store_sales/000008_0:67108864+67108864 | 842 ms | 40 ms | | store_sales/000010_0:67108864+67108864 | 1302 ms | 82 ms | | store_sales/000012_0:67108864+67108864 | 989 ms | 50 ms | | store_sales/000014_0:67108864+67108864 | 970 ms | 43 ms | | store_sales/000001_0:67108864+67108864 | 829 ms | 47 ms | | store_sales/000003_0:67108864+67108864 | 811 ms | 43 ms | | store_sales/000007_0:67108864+67108864 | 865 ms | 51 ms | | store_sales/000005_0:67108864+67108864 | 1042 ms | 59 ms | | store_sales/000009_0:67108864+67108864 | 902 ms | 39 ms | | store_sales/000011_0:67108864+67108864 | 1046 ms | 42 ms | | store_sales/000013_0:67108864+67108864 | 1048 ms | 44 ms | As expected, the function is faster by an order of magnitude & fast enough to not need more optimization in the inner sync.length for loop. Over all, the query was faster by 2+ seconds for a 28 second query (since we have 8 slots and 15 mappers, so that's expected). > Improve RCFile::sync(long) 10x > ------------------------------ > > Key: HIVE-4423 > URL: https://issues.apache.org/jira/browse/HIVE-4423 > Project: Hive > Issue Type: Improvement > Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) > Reporter: Gopal V > Assignee: Gopal V > Priority: Minor > Fix For: 0.11.0 > > Attachments: HIVE-4423.patch > > > RCFile::sync(long) takes approx ~1 second everytime it gets called because of > the inner loops in the function. > From what was observed with HDFS-4710, single byte reads are an order of > magnitude slower than larger 512 byte buffer reads. > Even when disk I/O is buffered to this size, there is overhead due to the > synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes. > Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) > call will speed this function >10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira