[
https://issues.apache.org/jira/browse/HADOOP-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620604#comment-14620604
]
Tsuyoshi Ozawa commented on HADOOP-10694:
-----------------------------------------
[~gopalv] thank you for the contribution. +1 for the change.
However, it's a bit dangerous to remove synchronization from DataInputBuffer
because there are lots caller of DataInputBuffer. How about adding
NonSyncByteArrayInputStream and changing to use it in MapTask and ReduceTask
instead of removing lock?
Additionaly, I think that it would be better NonSyncByteArrayInputStream's
extending DataInputBuffer, because we don't need to change other code except
new statement of NonSyncByteArrayInputStream instead of DataInputBuffer.
> Remove synchronized input streams from Writable deserialization
> ---------------------------------------------------------------
>
> Key: HADOOP-10694
> URL: https://issues.apache.org/jira/browse/HADOOP-10694
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Reporter: Gopal V
> Assignee: Gopal V
> Labels: BB2015-05-TBR
> Attachments: HADOOP-10694.1.patch, writable-read-sync.png
>
>
> Writable deserialization is slowing down due to a synchronized block within
> DataInputBuffer$Buffer.
> ByteArrayInputStream::read() is synchronized and this shows up as a slow
> uncontested lock.
> Hive ships with its own faster thread-unsafe version with
> hive.common.io.NonSyncByteArrayInputStream.
> !writable-read-sync.png!
> The DataInputBuffer and Writable deserialization should not require a lock
> per readInt()/read().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)