[
https://issues.apache.org/jira/browse/HDFS-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968245#comment-15968245
]
Andrew Wang commented on HDFS-11644:
------------------------------------
Hi Steve, Stack, thanks for replying!
bq. I'd advocate what we've been discussing in HADOOP-9565, some single method
boolean hasFeature(String), where a feature can be probed for. Implement that
and have one probe "syncable" to make: base classes can implement, subclasses
can override that specific string. Add that method as a new Interface and we
can adopt it across all our streams.
bq. at each point at which we encounter a feature, rather, there'd be an
upfront query that could be run before engaging w/ the fs implementation
In this specific case though, HDFS can return either a normal DFSOutputStream,
which can sync, or a DFSStripedOutputStream, which can't sync. So, we need a
feature probe that's at the stream level rather than the FS level.
I thought more about {{instanceof}} checks for an interface and decided that
it's inferior, since we'd potentially end up with an exponential explosion in
subclasses. +1 for a probe method.
bq. (though how does this work if tiering changes the underlying storage on us
at runtime?).
That's one issue. Something similar happens with federation, we don't know what
the child FileSystems are until runtime since it's specified in configuration.
And there's the current issue where features are mutually exclusive (e.g. EC
and Syncable). We often don't know until runtime, and until we open the stream.
bq. Meantime, having DFSStripedOutputStream throw an exception breaking all
that run on top (with no means of querying whether support or not) seems
disruptive.
It's disruptive, but I thought it better than losing data? Bit moot though,
since we intend to fix it properly here :)
BTW, HDFS-11643 might be interesting for HBase as well. We're discussing adding
a boolean parameter to {{create}} to always create a replicated file.
> DFSStripedOutputStream should not implement Syncable
> ----------------------------------------------------
>
> Key: HDFS-11644
> URL: https://issues.apache.org/jira/browse/HDFS-11644
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: erasure-coding
> Affects Versions: 3.0.0-alpha1
> Reporter: Andrew Wang
> Assignee: Manoj Govindassamy
> Labels: hdfs-ec-3.0-must-do
>
> FSDataOutputStream#hsync checks if a stream implements Syncable, and if so,
> calls hsync. Otherwise, it just calls flush. This is used, for instance, by
> YARN's FileSystemTimelineWriter.
> DFSStripedOutputStream extends DFSOutputStream, which implements Syncable.
> However, DFSStripedOS throws a runtime exception when the Syncable methods
> are called.
> We should refactor the inheritance structure so DFSStripedOS does not
> implement Syncable.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]