[ 
https://issues.apache.org/jira/browse/HADOOP-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359026#comment-14359026
 ] 

Steve Loughran commented on HADOOP-11708:
-----------------------------------------

More succinctly, a pre-emptive -1 to any changes to DFSOutputStream sync logic 
in 2.7, as it needs to be accompanied by thought, investigation & pretty 
rigorous specifications — currently in Z-pretending-to-be-Python, but I'll 
happily take TLA+ if you prefer.

Changing HBase WAL to work with unsynced streams or fixing CryptoOutputStream 
to implement same expectations of HDFS are much lower risk.

> CryptoOutputStream synchronization differences from DFSOutputStream break 
> HBase
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-11708
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11708
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.6.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>
> For the write-ahead-log, HBase writes to DFS from a single thread and sends 
> sync/flush/hflush from a configurable number of other threads (default 5).
> FSDataOutputStream does not document anything about being thread safe, and it 
> is not thread safe for concurrent writes.
> However, DFSOutputStream is thread safe for concurrent writes + syncs. When 
> it is the stream FSDataOutputStream wraps, the combination is threadsafe for 
> 1 writer and multiple syncs (the exact behavior HBase relies on).
> When HDFS Transparent Encryption is turned on, CryptoOutputStream is inserted 
> between FSDataOutputStream and DFSOutputStream. It is proactively labeled as 
> not thread safe, and this composition is not thread safe for any operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to