[
https://issues.apache.org/jira/browse/HDFS-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786998#comment-16786998
]
Gopal V commented on HDFS-14345:
--------------------------------
bq. HDFS-6803, HDFS-6735, HADOOP-15557, and HADOOP-11708 spring to mind. We
don't dare weaken the thread safety of the default DFS streams.
All of them are for positional-read - I'd like to not touch that part, because
it is doing the right thing (& ORC is using readFully etc).
This ticket is specifically for the non-positional read (i.e which modifies the
offset of the read position & isn't really synchronized).
bq. which of these are places where switching to a new version would deliver
speedups without doing anything to HDFS or other FSDataInputStream-wrapped
connections?
Most of Mapreduce would benefit, the place where this is significant right now
is Shuffle output/input right now.
Here's the Tez and Hive fixes, I've made (and HADOOP-10694 related to Writable)
https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/io/NonSyncByteArrayInputStream.java
+
https://github.com/apache/tez/blob/master/tez-common/src/main/java/org/apache/tez/common/io/NonSyncByteArrayInputStream.java
> fs.BufferedFSInputStream::read is synchronized
> ----------------------------------------------
>
> Key: HDFS-14345
> URL: https://issues.apache.org/jira/browse/HDFS-14345
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Gopal V
> Priority: Major
> Attachments: bufferedinputstream-read-sync.png
>
>
> BufferedInputStream::read() has performance issues - this can be fixed by
> wrapping the stream in another non-synchronized buffered inputstream, but
> that incurs memory copy overheads and is sub-optimal.
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/io/BufferedInputStream.java#L269
> Hadoop fs streams aren't thread-safe (except for ReadFully) and are stateful
> for position, so this synchronization is purely a tax without benefit.
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BufferedFSInputStream.java#L35
> The readFully skips the BufferedInputStream super classes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]