[
https://issues.apache.org/jira/browse/KNOX-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713488#comment-13713488
]
Maksim Kononenko edited comment on KNOX-22 at 7/19/13 9:31 AM:
---------------------------------------------------------------
Here is a table that I found which shows HDFS APIs.
File System Comm. Method Scheme / Prefix / Port Read /
Write Cross Version
HDFS RPC hdfs://...:8020
Read / Write Same HDFS version only
HFTP HTTP hftp://...:50070 Read
only Version independent
WebHDFS HTTP (REST) webhdfs://...:50070 Read /
Write Version independent
1. HDFS.
It is designed to use Sockets as transport mechanism and has "pluggable"
marshalling protocol support.
For marshalling it implements two protocols:
- Google Protocol Buffers;
- Raw bytes read/write:
Code extract:
public void write(DataOutput out) throws IOException {
out.writeLong(rpcVersion);
UTF8.writeString(out, declaringClassProtocolName);
UTF8.writeString(out, methodName);
out.writeLong(clientVersion);
out.writeInt(clientMethodsHash);
out.writeInt(parameterClasses.length);
for (int i = 0; i < parameterClasses.length; i++) {
ObjectWritable.writeObject(out, parameters[i], parameterClasses[i],
conf, true);
}
}
But Google Protocol Buffers usage is hardcoded - I didn't find any mechanism
for switching between these marshalling protocols.
This API requires the same client/server version.
2. HFTP.
Works based on simple HTTP.
Here is an example of URL being generated for "LS" command:
http://host:50070/listPaths/?ugi=root,root
Message payload is an XML document.
3. WebHDFS.
Works based on HTTP (REST).
I tried to configure hadoop CLI to work through the gateway and my attempt has
failed.
I found following in the code for webhdfs FileSystem API:
URL for connection is formed as
"http" + nnAddr.getHostName() + ":" + nnAddr.getPort() + "/webhdfs/v1/" + path
+ '?' + query
Strings in quotes are hardcoded so we can't change schema and context path.
As authentication mechanism, Hadoop CLI supports just Kerberos.
was (Author: [email protected]):
Here is a table that I found which shows HDFS APIs.
File System Comm. Method Scheme / Prefix / Port Read /
Write Cross Version
HDFS RPC hdfs://...:8020
Read / Write Same HDFS version only
HFTP HTTP hftp://...:50070 Read
only Version independent
WebHDFS HTTP (REST) webhdfs://...:50070 Read /
Write Version independent
1. HDFS.
It is designed to use Sockets as transport mechanism and has "pluggable"
marshalling protocol support.
For marshalling it implements two protocols:
- Google Protocol Buffers;
- Raw bytes read/write:
Code extract:
public void write(DataOutput out) throws IOException {
out.writeLong(rpcVersion);
UTF8.writeString(out, declaringClassProtocolName);
UTF8.writeString(out, methodName);
out.writeLong(clientVersion);
out.writeInt(clientMethodsHash);
out.writeInt(parameterClasses.length);
for (int i = 0; i < parameterClasses.length; i++) {
ObjectWritable.writeObject(out, parameters[i], parameterClasses[i],
conf, true);
}
}
But Google Protocol Buffers usage is hardcoded - I didn't find any mechanism
for switching between these marshalling protocols.
This API requires the same client/server version.
2. HFTP.
Works based on simple HTTP.
Here is an example of URL being generated for "LS" command:
http://host:50070/listPaths/?ugi=root,root
3. WebHDFS.
Works based on HTTP (REST).
I tried to configure hadoop CLI to work through the gateway and my attempt has
failed.
I found following in the code for webhdfs FileSystem API:
URL for connection is formed as
"http" + nnAddr.getHostName() + ":" + nnAddr.getPort() + "/webhdfs/v1/" + path
+ '?' + query
Strings in quotes are hardcoded so we can't change schema and context path.
As authentication mechanism, Hadoop CLI supports just Kerberos.
> Invoke HDFS via gateway using hadoop CLI and FileSystem API
> -----------------------------------------------------------
>
> Key: KNOX-22
> URL: https://issues.apache.org/jira/browse/KNOX-22
> Project: Apache Knox
> Issue Type: New Feature
> Components: ClientDSL
> Affects Versions: 0.2.0
> Reporter: Kevin Minder
> Assignee: Maksim Kononenko
>
> From BUG-4301
> It should be possible to use the existing HDFS clients to access HDFS via the
> gateway. These existing clients are the hadoop cli and the FileSystem Java
> API.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira