[
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870043#comment-13870043
]
David Powell commented on HDFS-5751:
------------------------------------
Alas, I am intimately familiar with the reimplementation necessary, and wish
there was less of it to do and to maintain. That said, precluding alternate
implementations because creating one would require more than the ideal amount
of work feels like throwing the baby out with the bathwater.
Moving the abstraction lower is along the lines of what I had in mind when I
suggested the middle ground of changes that reduce mainline maintenance burden
while preserving a usable interface for others. I think the lower surface of
the official FsDatasetImpl is far too low, however, and that comparing HDFS
with ext3fs is both underestimating the complexity and modularity of HDFS and
overestimating the versatility of the simple interface a traditional filesystem
consumes. Which is to say, I think there is a class of problems which would
lead one to replace a traditional filesystem entirely, but could be solved much
more elegantly in HDFS given its components' architectural separation.
> Remove the FsDatasetSpi and FsVolumeSpi interfaces
> --------------------------------------------------
>
> Key: HDFS-5751
> URL: https://issues.apache.org/jira/browse/HDFS-5751
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, test
> Affects Versions: 3.0.0
> Reporter: Arpit Agarwal
>
> The in-memory block map and disk interface portions of the DataNode have been
> abstracted out into an {{FsDatasetpSpi}} interface, which further uses
> {{FsVolumeSpi}} to represent individual volumes.
> The abstraction is useful as it allows DataNode tests to use a
> {{SimulatedFSDataset}} which does not write any data to disk. Instead it just
> stores block metadata in memory and returns zeroes for all reads. This is
> useful for both unit testing and for simulating arbitrarily large datanodes
> without having to provision real disk capacity.
> A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and
> {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
> However there are a few problems with this approach:
> # Using the factory class significantly complicates the code flow for the
> common case. This makes the code harder to understand and debug.
> # There is additional burden of maintaining two different dataset
> implementations.
> # Fidelity between the two implementations is poor.
> Instead we can eliminate the SPIs and just hide the disk read/write routines
> with a dependency injection framework like Google Guice.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)