[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable

Buddy (JIRA) Fri, 07 Feb 2014 07:21:35 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894619#comment-13894619
 ]


Buddy commented on HDFS-5868:
-----------------------------

David,

Thanks very much for looking at this and for the suggestions.

* We see that LOG.warn in our logs and that is certainly an area that we would 
like to clean up. However, I think that we need to make {{manageWriterOsCache}} 
more pluggable to deal with that. The LOG.warn is telling us that some  
functionality is being bypassed, which seems like a valid warning  until the 
FsDatasetSpi plugins are able to provide that functionality. Perhaps I should 
file a new sub-jira on that?
* I fully agree with your comments on the sync(OutputStream) signature and I 
like your solution. I have attached a new patch against the latest trunk. 
Thanks very much for pointing that out.

In the latest patch, I still check that the streams are an instance of 
FileOutputStream in the syncDataOut() and syncChecksumOut() methods. I think 
that can be avoided by making ReplicaOutputStreams an abstract class with 
abstract methods for syncDataOut() abd syncChecksumOut(). Then we could have 
ReplicaFileOutputStreams and ReplicaSimulatedOutputStreams that extend the 
abstract class and implement the sync methods, and FsDatasetSpi providers could 
have their own subclass with their own implementations, as we do.

ReplicaFileOutputStreams would be constructed by 
ReplicaInPipeline.createStreams since  it currently uses FileOutputStreams. 
ReplicaSimulatedOutputStreams would be constructed by SimulatedFSDataset, which 
does not currently use FileOutputStreams. 

In order to make this work, the abstract class ReplicaOutputStreams would also 
have to use generics because the Streams are passed into the constructor.

It seems to me like  this is a bit far to go for this patch, and that we should 
consider solving the manageWriterOsCache problem before doing something like 
this. But I would like to get your feedback and am happy to do it if the 
community thinks it is a good solution at this point.


> Make hsync implementation pluggable
> -----------------------------------
>
>                 Key: HDFS-5868
>                 URL: https://issues.apache.org/jira/browse/HDFS-5868
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: Buddy
>         Attachments: HDFS-5868-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable

Reply via email to