[
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894619#comment-13894619
]
Buddy commented on HDFS-5868:
-----------------------------
David,
Thanks very much for looking at this and for the suggestions.
* We see that LOG.warn in our logs and that is certainly an area that we would
like to clean up. However, I think that we need to make {{manageWriterOsCache}}
more pluggable to deal with that. The LOG.warn is telling us that some
functionality is being bypassed, which seems like a valid warning until the
FsDatasetSpi plugins are able to provide that functionality. Perhaps I should
file a new sub-jira on that?
* I fully agree with your comments on the sync(OutputStream) signature and I
like your solution. I have attached a new patch against the latest trunk.
Thanks very much for pointing that out.
In the latest patch, I still check that the streams are an instance of
FileOutputStream in the syncDataOut() and syncChecksumOut() methods. I think
that can be avoided by making ReplicaOutputStreams an abstract class with
abstract methods for syncDataOut() abd syncChecksumOut(). Then we could have
ReplicaFileOutputStreams and ReplicaSimulatedOutputStreams that extend the
abstract class and implement the sync methods, and FsDatasetSpi providers could
have their own subclass with their own implementations, as we do.
ReplicaFileOutputStreams would be constructed by
ReplicaInPipeline.createStreams since it currently uses FileOutputStreams.
ReplicaSimulatedOutputStreams would be constructed by SimulatedFSDataset, which
does not currently use FileOutputStreams.
In order to make this work, the abstract class ReplicaOutputStreams would also
have to use generics because the Streams are passed into the constructor.
It seems to me like this is a bit far to go for this patch, and that we should
consider solving the manageWriterOsCache problem before doing something like
this. But I would like to get your feedback and am happy to do it if the
community thinks it is a good solution at this point.
> Make hsync implementation pluggable
> -----------------------------------
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Affects Versions: 2.2.0
> Reporter: Buddy
> Attachments: HDFS-5868-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output
> streams are instances of FileOutputStream. Therefore, there is currently no
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard
> OS files.
> One possible solution is to push the implementation of hsync into the
> ReplicaOutputStreams class. This class is constructed by the
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore
> it can be extended. Instead of directly calling sync on the output stream,
> BlockReceiver would call ReplicaOutputStream.sync. The default
> implementation of sync in ReplicaOutputStream would be the same as the
> current implementation in BlockReceiver.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)