[ 
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868425#comment-13868425
 ] 

David Powell commented on HDFS-5751:
------------------------------------

The FsDatasetSpi interface is valuable in that it allows people to experiment 
with or deliver solutions on alternate storage systems without needing to make 
and maintain changes to the rest of the Datanode.  Removing such an interface 
-- effectively fixing the I/O path from the high level (HDFS block) to a low 
level (filesystem) -- seems like a contradictory step given the changing nature 
of storage today and some of the efforts made recently to keep pace with it 
(most notably HDFS-2832).

The past month or two has been a bit of a heyday for datanode development, and 
as I've scrambled to keep up with the various changes, both the difficulty and 
importance of maintaining an interface here has been thrown into relief.  In 
HDFS-5194, amidst a series of fairly mechanical matters, I suggest 
qualitatively that the design of FsDatasetSpi is something that could use some 
attention.  Though myself and other FsDatasetSpi implementors are especially 
sensitive to churn along this boundary, pursuing changes that reduce 
maintenance costs while preserving an interface that vendors and innovators 
could use strikes me as better path for all parties in the long term.  (And 
with the holidays and large merges behind me, I hope to return to finding ways 
of doing that.)

All that said, I half agree with this JIRA.  The FsVolumeSpi as an interface 
has always struck me as a little odd since it was consumed only by the 
FsDatasetImpl and FsDatasetSpi-external components that were specific to the 
FsDatasetImpl implementation.  In recent weeks it appears to have taken on new 
consumers, though, so I need to see if my criticisms of it are still valid.


> Remove the FsDatasetSpi and FsVolumeImpl interfaces
> ---------------------------------------------------
>
>                 Key: HDFS-5751
>                 URL: https://issues.apache.org/jira/browse/HDFS-5751
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, test
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>
> The in-memory block map and disk interface portions of the DataNode have been 
> abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
> {{FsVolumeSpi}} to represent individual volumes.
> The abstraction is useful as it allows DataNode tests to use a 
> {{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
> stores block metadata in memory and returns zeroes for all reads. This is 
> useful for both unit testing and for simulating arbitrarily large datanodes 
> without having to provision real disk capacity.
> A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
> {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
> However there are a few problems with this approach:
> # Using the factory class significantly complicates the code flow for the 
> common case. This makes the code harder to understand and debug.
> # There is additional burden of maintaining two different dataset 
> implementations.
> # Fidelity between the two implementations is poor.
> Instead we can eliminate the SPIs and just hide the disk read/write routines 
> with a dependency injection framework like Google Guice.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to