[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642745#comment-13642745
 ] 

Gopal V commented on HIVE-4423:
-------------------------------

|| split location || before || after ||
| store_sales/000000_0:67108864+67108864 | 748 ms |    81 ms  |
| store_sales/000002_0:67108864+67108864 | 966 ms |    54 ms |
| store_sales/000004_0:67108864+67108864 | 948 ms |    51 ms |
| store_sales/000006_0:67108864+67108864 | 922 ms |    42 ms |
| store_sales/000008_0:67108864+67108864 | 842 ms |    40 ms |
| store_sales/000010_0:67108864+67108864 | 1302 ms |   82 ms |
| store_sales/000012_0:67108864+67108864 | 989 ms |    50 ms |
| store_sales/000014_0:67108864+67108864 | 970 ms |    43 ms |
| store_sales/000001_0:67108864+67108864 | 829 ms |    47 ms |
| store_sales/000003_0:67108864+67108864 | 811 ms |    43 ms |
| store_sales/000007_0:67108864+67108864 | 865 ms |    51 ms |
| store_sales/000005_0:67108864+67108864 | 1042 ms |   59 ms |
| store_sales/000009_0:67108864+67108864 | 902 ms |    39 ms |
| store_sales/000011_0:67108864+67108864 | 1046 ms |   42 ms |
| store_sales/000013_0:67108864+67108864 | 1048 ms |   44 ms |

As expected, the function is faster by an order of magnitude & fast enough to 
not need more optimization in the inner sync.length for loop.

Over all, the query was faster by 2+ seconds for a 28 second query (since we 
have 8 slots and 15 mappers, so that's expected).
                
> Improve RCFile::sync(long) 10x
> ------------------------------
>
>                 Key: HIVE-4423
>                 URL: https://issues.apache.org/jira/browse/HIVE-4423
>             Project: Hive
>          Issue Type: Improvement
>         Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Minor
>             Fix For: 0.11.0
>
>         Attachments: HIVE-4423.patch
>
>
> RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
> the inner loops in the function.
> From what was observed with HDFS-4710, single byte reads are an order of 
> magnitude slower than larger 512 byte buffer reads. 
> Even when disk I/O is buffered to this size, there is overhead due to the 
> synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes.
> Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
> call will speed this function >10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to