[
https://issues.apache.org/jira/browse/HADOOP-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522729#comment-16522729
]
Todd Lipcon commented on HADOOP-15557:
--------------------------------------
Well, it doesn't really work on HDFS, because the concurrent read calls read
from undefined positions, so it's pretty hard for them to be correct. It's just
that on HDFS you'd get something like a checksum error at the HFile format
level, probably resulting in an automatic retry, whereas in the Crypto case,
you get a jvm segfault.
But I guess I take your point that no matter how the streams are used, crashing
is bad, and that documenting (and testing) the concurrency assumptions of all
streams provided by Common would be a worthy undertaking. Sadly, not one I
personally have time for, so I'll retain my role as just a reporter of this
jira :)
> CryptoInputStream can't handle concurrent access; inconsistent with HDFS
> ------------------------------------------------------------------------
>
> Key: HADOOP-15557
> URL: https://issues.apache.org/jira/browse/HADOOP-15557
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 3.2.0
> Reporter: Todd Lipcon
> Priority: Major
>
> In general, the non-positional read APIs for streams in Hadoop Common are
> meant to be used by only a single thread at a time. It would not make much
> sense to have concurrent multi-threaded access to seek+read because they
> modify the stream's file position. Multi-threaded access on input streams can
> be done using positional read APIs. Multi-threaded access on output streams
> probably never makes sense.
> In the case of DFSInputStream, the positional read APIs are marked
> synchronized, so that even when misused, no strange exceptions are thrown.
> The results are just somewhat undefined in that it's hard for a thread to
> know which position was read from. However, when running on an encrypted file
> system, the results are much worse: since CryptoInputStream's read methods
> are not marked synchronized, the caller can get strange ByteBuffer exceptions
> or even a JVM crash due to concurrent use and free of underlying OpenSSL
> Cipher buffers.
> The crypto stream wrappers should be made more resilient to such misuse, for
> example by:
> (a) making the read methods safer by making them synchronized (so they have
> the same behavior as DFSInputStream)
> or
> (b) trying to detect concurrent access to these methods and throwing
> ConcurrentModificationException so that the user is alerted to their probable
> misuse.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]