[ 
https://issues.apache.org/jira/browse/HADOOP-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359257#comment-14359257
 ] 

Steve Loughran commented on HADOOP-11708:
-----------------------------------------

bq. FWIW, I just picked the first unreleased versions on the jira. 

OK, setting 2.8 as the target.

bq. It's chasing one undocumented and likely broken implementation with another 
one.

"Broken" is an opinion I'm not sure I agree with

# The behaviour is certainly not documented or explicitly specified in the [FS 
compatibility 
spec|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html]
# it is a stronger concurrency/consistency model than presented by 
{{OutputStream}}, so {{DFSOutputStream}} can be used wherever an 
{{OutputStream}} is needed
# it's clear that this behaviour is expected in at least one application 

In  
[FileSystem|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]
 , {{listStatus(), mkdirs()}} we do explicitly call out the 
atomicity/concurrency expectations *as defined by HDFS*. Some of those are not 
the result of deliberate decisions —the fact that mkdirs() is atomic is due to 
the NN grabbing a lock for optimised directory path creation— but they are 
behaviours that we have to accept as defacto standards as defined by 
applications-running-above-HDFS. All we can do is document them for the benefit 
of other filesystems seeking Hadoop HDFS compatibility, and try not to change 
them in HDFS such that applications break. Having that documentation to call 
out concurrency semantics on output streams is the way to do this. Given that 
the HDFS encryption is intended to be transparent, it's going to have to have a 
consistent concurrency & consistency model. 



> CryptoOutputStream synchronization differences from DFSOutputStream break 
> HBase
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-11708
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11708
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.6.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>
> For the write-ahead-log, HBase writes to DFS from a single thread and sends 
> sync/flush/hflush from a configurable number of other threads (default 5).
> FSDataOutputStream does not document anything about being thread safe, and it 
> is not thread safe for concurrent writes.
> However, DFSOutputStream is thread safe for concurrent writes + syncs. When 
> it is the stream FSDataOutputStream wraps, the combination is threadsafe for 
> 1 writer and multiple syncs (the exact behavior HBase relies on).
> When HDFS Transparent Encryption is turned on, CryptoOutputStream is inserted 
> between FSDataOutputStream and DFSOutputStream. It is proactively labeled as 
> not thread safe, and this composition is not thread safe for any operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to