[
https://issues.apache.org/jira/browse/HADOOP-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576462#action_12576462
]
sanjay.radia edited comment on HADOOP-2885 at 3/7/08 5:24 PM:
--------------------------------------------------------------
Here are the 3 proposals on table with their pros and cons
Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as
the wrapper.
h1: Proposal 1: No HDFS in core
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs constains kfs, s3 wrappers etc BUT no HDFS classes.
FileSystem.get(conf) constructs DistributedFileSystem via dynamic class
loading.
hdfs
org.apache.hadoop.fs.hdfs contains client side and server side
Will generate 2 jars: hdfs-client.jar and hdfs-server.jar
mapred
org.apache.hadoop.mapred
Pros:
Can rev the HDFS client protocol by merely supplying a new jar.
(note that in practice this is not that useful in a distributed system
since you have distribute the updated protocol jar to all machines
running the application).
The hdfs protocol is not visible in core src tree
javadoc == ALL the classes in core
Cons:
App needs 2 jars: core.jar and hdfs-client.jar
Structure is not similar to fs.kfs and fs.s3
Harder to make DistribtuedFileSystem public if we wish since it is not sitting
in core (I don't think we should make it public anyway)
h1: Proposal 2: Client side HDFS [wrapper and protocol] in core
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs.hdfs contains DistributedFileSystem and DFSClient
fs constains kfs, s3 wrappers etc
hdfs
org.apache.hadoop.fs.hdfs contains server side only
mapred
org.apache.hadoop.mapred
Pros:
Apps need only one jar - core
Structure is a *partially* similar to fs.kfs and fs.s3
*Partially* and not *fully* similar because DFSClient is in core's fs.hdfs
The other fs wrappers do not contain their protocols
Easier to make DistribtuedFileSystem public if we wish since it is sitting
in core (I don't think we should make it public anyway)
Cons:
Reving the HDFS protocol requires updating core
The hdfs protocol is visible in core src tree
core's javadoc will need to exclude DFSClient and DistributedFileSystem
h1: Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs.hdfs contains DistributedFileSystem (but NOT DFSClient)
Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file
system
sits in core's fs.
hdfs
org.apache.hadoop.fs.hdfs contains server side and DFSClient
Two jars
mapred
org.apache.hadoop.mapred
Pros:
Can rev the HDFS client protocol by merely supplying a new jar
The hdfs protocol is not visible in core src tree
Structure is similar to fs.kfs and fs.s3
Easier to make DistribtuedFileSystem public if we wish since it is sitting
in core (I don't think we should make it public anyway)
Cons:
App needs core jar and hdfs-client jar
Circular dependedncy between core jar and hdfs-client jar
core's javadoc will need to exclude DistributedFileSystem
was (Author: sanjay.radia):
Here are the 3 proposals on table with their pros and cons
Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as
the wrapper.
Proposal 1: No HDFS in core
=========
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs constains kfs, s3 wrappers etc BUT no HDFS classes.
FileSystem.get(conf) constructs DistributedFileSystem via dynamic class
loading.
hdfs
org.apache.hadoop.fs.hdfs contains client side and server side
Will generate 2 jars: hdfs-client.jar and hdfs-server.jar
mapred
org.apache.hadoop.mapred
Pros:
Can rev the HDFS client protocol by merely supplying a new jar.
(note that in practice this is not that useful in a distributed system
since you have distribute the updated protocol jar to all machines
running the application).
The hdfs protocol is not visible in core src tree
javadoc == ALL the classes in core
Cons:
App needs 2 jars: core.jar and hdfs-client.jar
Structure is not similar to fs.kfs and fs.s3
Harder to make DistribtuedFileSystem public if we wish since it is not sitting
in core (I don't think we should make it public anyway)
Proposal 2: Client side HDFS [wrapper and protocol] in core
===========
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs.hdfs contains DistributedFileSystem and DFSClient
fs constains kfs, s3 wrappers etc
hdfs
org.apache.hadoop.fs.hdfs contains server side only
mapred
org.apache.hadoop.mapred
Pros:
Apps need only one jar - core
Structure is a *partially* similar to fs.kfs and fs.s3
*Partially* and not *fully* similar because DFSClient is in core's fs.hdfs
The other fs wrappers do not contain their protocols
Easier to make DistribtuedFileSystem public if we wish since it is sitting
in core (I don't think we should make it public anyway)
Cons:
Reving the HDFS protocol requires updating core
The hdfs protocol is visible in core src tree
core's javadoc will need to exclude DFSClient and DistributedFileSystem
Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate
===========
core
org.apache.hadoop.{io,conf,ipc,util,fs}
fs.hdfs contains DistributedFileSystem (but NOT DFSClient)
Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file
system
sits in core's fs.
hdfs
org.apache.hadoop.fs.hdfs contains server side and DFSClient
Two jars
mapred
org.apache.hadoop.mapred
Pros:
Can rev the HDFS client protocol by merely supplying a new jar
The hdfs protocol is not visible in core src tree
Structure is similar to fs.kfs and fs.s3
Easier to make DistribtuedFileSystem public if we wish since it is sitting
in core (I don't think we should make it public anyway)
Cons:
App needs core jar and hdfs-client jar
Circular dependedncy between core jar and hdfs-client jar
core's javadoc will need to exclude DistributedFileSystem
> Restructure the hadoop.dfs package
> ----------------------------------
>
> Key: HADOOP-2885
> URL: https://issues.apache.org/jira/browse/HADOOP-2885
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: dfs
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
> Priority: Minor
> Fix For: 0.17.0
>
> Attachments: Prototype dfs package.png
>
>
> This Jira proposes restructurign the package hadoop.dfs.
> 1. Move all server side and internal protocols (NN-DD etc) to
> hadoop.dfs.server.*
> 2. Further breakdown of dfs.server.
> - dfs.server.namenode.*
> - dfs.server.datanode.*
> - dfs.server.balancer.*
> - dfs.server.common.* - stuff shared between the various servers
> - dfs.protocol.* - internal protocol between DN, NN and Balancer etc.
> 3. Client interface:
> - hadoop.dfs.DistributedFileSystem.java
> - hadoop.dfs.ChecksumDistributedFileSystem.java
> - hadoop.dfs.HftpFilesystem.java
> - hadoop.dfs.protocol.* - the client side protocol
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.