[ 
https://issues.apache.org/jira/browse/HADOOP-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606152#comment-14606152
 ] 

Colin Patrick McCabe commented on HADOOP-11708:
-----------------------------------------------

Thanks, [~busbey].  I see that we have a file 
{{hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md}}
 that discusses the concurrency guarantees of Hadoop input streams now.  
[~steve_l], do we have one for output streams as well?  Maybe I missed it?  If 
not, we should create something like that.

> CryptoOutputStream synchronization differences from DFSOutputStream break 
> HBase
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-11708
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11708
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.6.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>
> For the write-ahead-log, HBase writes to DFS from a single thread and sends 
> sync/flush/hflush from a configurable number of other threads (default 5).
> FSDataOutputStream does not document anything about being thread safe, and it 
> is not thread safe for concurrent writes.
> However, DFSOutputStream is thread safe for concurrent writes + syncs. When 
> it is the stream FSDataOutputStream wraps, the combination is threadsafe for 
> 1 writer and multiple syncs (the exact behavior HBase relies on).
> When HDFS Transparent Encryption is turned on, CryptoOutputStream is inserted 
> between FSDataOutputStream and DFSOutputStream. It is proactively labeled as 
> not thread safe, and this composition is not thread safe for any operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to