This is an automated email from the ASF dual-hosted git repository.
pifta pushed a commit to branch HDDS-5447-httpfs
in repository https://gitbox.apache.org/repos/asf/ozone.git
The following commit(s) were added to refs/heads/HDDS-5447-httpfs by this push:
new 8b2a3bd922 HDDS-5966. [HTTPFSGW] Update module doc, and place it in
Ozone project docs (#4250)
8b2a3bd922 is described below
commit 8b2a3bd9220e4f23d14f673d88fdf709692b7896
Author: Zita Dombi <[email protected]>
AuthorDate: Mon Feb 27 12:41:06 2023 +0100
HDDS-5966. [HTTPFSGW] Update module doc, and place it in Ozone project docs
(#4250)
---
hadoop-hdds/docs/content/design/httpfs.md | 31 ++++
hadoop-hdds/docs/content/interface/HttpFS.md | 119 +++++++++++++
hadoop-hdds/docs/content/tools/_index.md | 1 +
.../src/site/markdown/ServerSetup.md.vm | 198 ---------------------
.../src/site/markdown/UsingHttpTools.md | 62 -------
.../httpfsgateway/src/site/markdown/index.md | 54 ------
6 files changed, 151 insertions(+), 314 deletions(-)
diff --git a/hadoop-hdds/docs/content/design/httpfs.md
b/hadoop-hdds/docs/content/design/httpfs.md
new file mode 100644
index 0000000000..ad174199aa
--- /dev/null
+++ b/hadoop-hdds/docs/content/design/httpfs.md
@@ -0,0 +1,31 @@
+---
+title: HttpFS support for Ozone
+summary: HttpFS is a WebHDFS compatible interface that is added as a separate
role to Ozone.
+date: 2023-02-03
+jira: HDDS-5447
+status: implemented
+author: Zita Dombi, Istvan Fajth
+---
+<!--
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# Abstract
+
+Ozone HttpFS provides an HttpFS-compatible REST API interface to enable
applications
+that are designed to use
[HttpFS](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html)
+to interact and integrate with Ozone.
+
+# Link
+
+https://issues.apache.org/jira/secure/attachment/13031822/HTTPFS%20interface%20for%20Ozone.pdf
diff --git a/hadoop-hdds/docs/content/interface/HttpFS.md
b/hadoop-hdds/docs/content/interface/HttpFS.md
new file mode 100644
index 0000000000..e413faf03c
--- /dev/null
+++ b/hadoop-hdds/docs/content/interface/HttpFS.md
@@ -0,0 +1,119 @@
+---
+title: HttpFS Gateway
+weight: 7
+menu:
+ main:
+ parent: "Client Interfaces"
+summary: Ozone HttpFS is a WebHDFS compatible interface implementation, as a
separate role it provides an easy integration with Ozone.
+---
+
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+Ozone HttpFS can be used to integrate Ozone with other tools via REST API.
+
+## Introduction
+
+Ozone HttpFS is forked from the HDFS HttpFS endpoint implementation
([HDDS-5448](https://issues.apache.org/jira/browse/HDDS-5448)). Ozone HttpFS is
intended to be added optionally as a role in an Ozone cluster, similar to [S3
Gateway]({{< ref "design/s3gateway.md" >}}).
+
+HttpFS is a service that provides a REST HTTP gateway supporting File System
operations (read and write). It is interoperable with the **webhdfs** REST HTTP
API.
+
+HttpFS can be used to access data on an Ozone cluster behind of a firewall.
For example, the HttpFS service acts as a gateway and is the only system that
is allowed to cross the firewall into the cluster.
+
+HttpFS can be used to access data in Ozone using HTTP utilities (such as curl
and wget) and HTTP libraries Perl from other languages than Java.
+
+The **webhdfs** client FileSystem implementation can be used to access HttpFS
using the Ozone filesystem command line tool (`ozone fs`) as well as from Java
applications using the Hadoop FileSystem Java API.
+
+HttpFS has built-in security supporting Hadoop pseudo authentication and
Kerberos SPNEGO and other pluggable authentication mechanisms. It also provides
Hadoop proxy user support.
+
+
+## Getting started
+
+HttpFS service itself is a Jetty based web-application that uses the Hadoop
FileSystem API to talk to the cluster, it is a separate service which provides
access to Ozone via a REST APIs. It should be started in addition to other
regular Ozone components.
+
+To try it out, you can start a Docker Compose dev cluster that has an HttpFS
gateway.
+
+Extract the release tarball, go to the `compose/ozone` directory and start the
cluster:
+
+```bash
+docker-compose up -d --scale datanode=3
+```
+
+You can/should find now the HttpFS gateway in docker with the name
`ozone_httpfs`.
+HttpFS HTTP web-service API calls are HTTP REST calls that map to an Ozone
file system operation. For example, using the `curl` Unix command.
+
+E.g. in the docker cluster you can execute commands like these:
+
+* `curl -i -X PUT
"http://httpfs:14000/webhdfs/v1/vol1?op=MKDIRS&user.name=hdfs"` creates a
volume called `vol1`.
+
+
+* `$ curl
'http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt?op=OPEN&user.name=foo'`
returns the content of the key `/user/foo/README.txt`.
+
+
+## Supported operations
+
+Here are the tables of WebHDFS REST APIs and their state of support in Ozone.
+
+### File and Directory Operations
+
+Operation | Support
+--------------------------------|---------------------
+Create and Write to a File | supported
+Append to a File | not implemented in Ozone
+Concat File(s) | not implemented in Ozone
+Open and Read a File | supported
+Make a Directory | supported
+Create a Symbolic Link | not implemented in Ozone
+Rename a File/Directory | supported (with limitations)
+Delete a File/Directory | supported
+Truncate a File | not implemented in Ozone
+Status of a File/Directory | supported
+List a Directory | supported
+List a File | supported
+Iteratively List a Directory | supported
+
+
+### Other File System Operations
+
+Operation | Support
+--------------------------------------|---------------------
+Get Content Summary of a Directory | supported
+Get Quota Usage of a Directory | supported
+Set Quota | not implemented in Ozone FileSystem API
+Set Quota By Storage Type | not implemented in Ozone
+Get File Checksum | unsupported (to be fixed)
+Get Home Directory | unsupported (to be fixed)
+Get Trash Root | unsupported
+Set Permission | not implemented in Ozone FileSystem API
+Set Owner | not implemented in Ozone FileSystem API
+Set Replication Factor | not implemented in Ozone FileSystem API
+Set Access or Modification Time | not implemented in Ozone FileSystem API
+Modify ACL Entries | not implemented in Ozone FileSystem API
+Remove ACL Entries | not implemented in Ozone FileSystem API
+Remove Default ACL | not implemented in Ozone FileSystem API
+Remove ACL | not implemented in Ozone FileSystem API
+Set ACL | not implemented in Ozone FileSystem API
+Get ACL Status | not implemented in Ozone FileSystem API
+Check access | not implemented in Ozone FileSystem API
+
+
+
+## Hadoop user and developer documentation about HttpFS
+
+* [HttpFS Server
Setup](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/ServerSetup.html)
+
+* [Using HTTP
Tools](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/ServerSetup.html)
\ No newline at end of file
diff --git a/hadoop-hdds/docs/content/tools/_index.md
b/hadoop-hdds/docs/content/tools/_index.md
index 12dd7f4faa..133223d9ab 100644
--- a/hadoop-hdds/docs/content/tools/_index.md
+++ b/hadoop-hdds/docs/content/tools/_index.md
@@ -37,6 +37,7 @@ Daemon commands:
stopped.
* **s3g** - Start the S3 compatible REST gateway
* **recon** - The Web UI service of Ozone can be started with this command.
+ * **httpfs** - Start the HttpFS gateway
Client commands:
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm
b/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm
deleted file mode 100644
index 2d0a5b8cd2..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm
+++ /dev/null
@@ -1,198 +0,0 @@
-<!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Server Setup
-====================================
-
-This page explains how to quickly setup HttpFS with Pseudo authentication
against a Hadoop cluster with Pseudo authentication.
-
-Install HttpFS
---------------
-
- ~ $ tar xzf httpfs-${project.version}.tar.gz
-
-Configure HttpFS
-----------------
-
-By default, HttpFS assumes that Hadoop configuration files (`core-site.xml &
hdfs-site.xml`) are in the HttpFS configuration directory.
-
-If this is not the case, add to the `httpfs-site.xml` file the
`httpfs.hadoop.config.dir` property set to the location of the Hadoop
configuration directory.
-
-Configure Hadoop
-----------------
-
-Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS
server as a proxyuser. For example:
-
-```xml
- <property>
- <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
- <value>httpfs-host.foo.com</value>
- </property>
- <property>
- <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
- <value>*</value>
- </property>
-```
-
-IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the
HttpFS server.
-
-Restart Hadoop
---------------
-
-You need to restart Hadoop for the proxyuser configuration to become active.
-
-Start/Stop HttpFS
------------------
-
-To start/stop HttpFS, use `hdfs --daemon start|stop httpfs`. For example:
-
- hadoop-${project.version} $ hdfs --daemon start httpfs
-
-NOTE: The script `httpfs.sh` is deprecated. It is now just a wrapper of
-`hdfs httpfs`.
-
-Test HttpFS is working
-----------------------
-
- $ curl -sS
'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
- {"Path":"\/user\/hdfs"}
-
-HttpFS Configuration
---------------------
-
-HttpFS preconfigures the HTTP port to 14000.
-
-HttpFS supports the following [configuration
properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml`
configuration file.
-
-HttpFS over HTTPS (SSL)
------------------------
-
-Enable SSL in `etc/hadoop/httpfs-site.xml`:
-
-```xml
- <property>
- <name>httpfs.ssl.enabled</name>
- <value>true</value>
- <description>
- Whether SSL is enabled. Default is false, i.e. disabled.
- </description>
- </property>
-```
-
-Configure `etc/hadoop/ssl-server.xml` with proper values, for example:
-
-```xml
- <property>
- <name>ssl.server.keystore.location</name>
- <value>${user.home}/.keystore</value>
- <description>Keystore to be used. Must be specified.
- </description>
- </property>
-
- <property>
- <name>ssl.server.keystore.password</name>
- <value></value>
- <description>Must be specified.</description>
- </property>
-
- <property>
- <name>ssl.server.keystore.keypassword</name>
- <value></value>
- <description>Must be specified.</description>
- </property>
-```
-
-The SSL passwords can be secured by a credential provider. See
-[Credential Provider
API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
-
-You need to create an SSL certificate for the HttpFS server. As the `httpfs`
Unix user, using the Java `keytool` command to create the SSL certificate:
-
- $ keytool -genkey -alias jetty -keyalg RSA
-
-You will be asked a series of questions in an interactive prompt. It will
create the keystore file, which will be named **.keystore** and located in the
`httpfs` user home directory.
-
-The password you enter for "keystore password" must match the value of the
-property `ssl.server.keystore.password` set in the `ssl-server.xml` in the
-configuration directory.
-
-The answer to "What is your first and last name?" (i.e. "CN") must be the
hostname of the machine where the HttpFS Server will be running.
-
-Start HttpFS. It should work over HTTPS.
-
-Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the
`swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing
the public key of the SSL certificate if using a self-signed certificate.
-For more information about the client side settings, see [SSL Configurations
for
SWebHDFS](../hadoop-project-dist/hadoop-hdfs/WebHDFS.html#SSL_Configurations_for_SWebHDFS).
-
-NOTE: Some old SSL clients may use weak ciphers that are not supported by the
HttpFS server. It is recommended to upgrade the SSL client.
-
-Deprecated Environment Variables
---------------------------------
-
-The following environment variables are deprecated. Set the corresponding
-configuration properties instead.
-
-Environment Variable | Configuration Property | Configuration File
-----------------------------|------------------------------|--------------------
-HTTPFS_HTTP_HOSTNAME | httpfs.http.hostname | httpfs-site.xml
-HTTPFS_HTTP_PORT | httpfs.http.port | httpfs-site.xml
-HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and
hadoop.http.max.response.header.size | httpfs-site.xml
-HTTPFS_MAX_THREADS | hadoop.http.max.threads | httpfs-site.xml
-HTTPFS_SSL_ENABLED | httpfs.ssl.enabled | httpfs-site.xml
-HTTPFS_SSL_KEYSTORE_FILE | ssl.server.keystore.location | ssl-server.xml
-HTTPFS_SSL_KEYSTORE_PASS | ssl.server.keystore.password | ssl-server.xml
-
-HTTP Default Services
----------------------
-
-Name | Description
--------------------|------------------------------------
-/conf | Display configuration properties
-/jmx | Java JMX management interface
-/logLevel | Get or set log level per class
-/logs | Display log files
-/stacks | Display JVM stacks
-/static/index.html | The static home page
-
-To control the access to servlet `/conf`, `/jmx`, `/logLevel`, `/logs`,
-and `/stacks`, configure the following properties in `httpfs-site.xml`:
-
-```xml
- <property>
- <name>hadoop.security.authorization</name>
- <value>true</value>
- <description>Is service-level authorization enabled?</description>
- </property>
-
- <property>
- <name>hadoop.security.instrumentation.requires.admin</name>
- <value>true</value>
- <description>
- Indicates if administrator ACLs are required to access
- instrumentation servlets (JMX, METRICS, CONF, STACKS).
- </description>
- </property>
-
- <property>
- <name>httpfs.http.administrators</name>
- <value></value>
- <description>ACL for the admins, this configuration is used to control
- who can access the default servlets for HttpFS server. The value
- should be a comma separated list of users and groups. The user list
- comes first and is separated by a space followed by the group list,
- e.g. "user1,user2 group1,group2". Both users and groups are optional,
- so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2"
- are all valid (note the leading space in " group1"). '*' grants access
- to all users and groups, e.g. '*', '* ' and ' *' are all valid.
- </description>
- </property>
-```
\ No newline at end of file
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md
b/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md
deleted file mode 100644
index 3045ad6506..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md
+++ /dev/null
@@ -1,62 +0,0 @@
-<!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Using HTTP Tools
-========================================
-
-Security
---------
-
-Out of the box HttpFS supports both pseudo authentication and Kerberos HTTP
SPNEGO authentication.
-
-### Pseudo Authentication
-
-With pseudo authentication the user name must be specified in the
`user.name=<USERNAME>` query string parameter of a HttpFS URL. For example:
-
- $ curl "http://<HTTFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=babu"
-
-### Kerberos HTTP SPNEGO Authentication
-
-Kerberos HTTP SPNEGO authentication requires a tool or library supporting
Kerberos HTTP SPNEGO protocol.
-
-IMPORTANT: If using `curl`, the `curl` version being used must support GSS
(`curl -V` prints out 'GSS' if it supports it).
-
-For example:
-
- $ kinit
- Please enter the password for user@LOCALHOST:
- $ curl --negotiate -u foo
"http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
- Enter host password for user 'foo':
-
-NOTE: the `-u USER` option is required by the `--negotiate` but it is not
used. Use any value as `USER` and when asked for the password press [ENTER] as
the password value is ignored.
-
-### Remembering Who I Am (Establishing an Authenticated Session)
-
-As most authentication mechanisms, Hadoop HTTP authentication authenticates
users once and issues a short-lived authentication token to be presented in
subsequent requests. This authentication token is a signed HTTP Cookie.
-
-When using tools like `curl`, the authentication token must be stored on the
first request doing authentication, and submitted in subsequent requests. To do
this with curl the `-b` and `-c` options to save and send HTTP Cookies must be
used.
-
-For example, the first request doing authentication should save the received
HTTP Cookies.
-
-Using Pseudo Authentication:
-
- $ curl -c ~/.httpfsauth
"http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=foo"
-
-Using Kerberos HTTP SPNEGO authentication:
-
- $ curl --negotiate -u foo -c ~/.httpfsauth
"http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
-
-Then, subsequent requests forward the previously received HTTP Cookie:
-
- $ curl -b ~/.httpfsauth
"http://<HTTPFS_HOST>:14000/webhdfs/v1?op=liststatus"
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/index.md
b/hadoop-ozone/httpfsgateway/src/site/markdown/index.md
deleted file mode 100644
index 6eef9e7d30..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/index.md
+++ /dev/null
@@ -1,54 +0,0 @@
-<!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Documentation Sets
-==========================================
-
-HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File
System operations (read and write). And it is interoperable with the
**webhdfs** REST HTTP API.
-
-HttpFS can be used to transfer data between clusters running different
versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop
DistCP.
-
-HttpFS can be used to access data in HDFS on a cluster behind of a firewall
(the HttpFS server acts as a gateway and is the only system that is allowed to
cross the firewall into the cluster).
-
-HttpFS can be used to access data in HDFS using HTTP utilities (such as curl
and wget) and HTTP libraries Perl from other languages than Java.
-
-The **webhdfs** client FileSystem implementation can be used to access HttpFS
using the Hadoop filesystem command (`hadoop fs`) line tool as well as from
Java applications using the Hadoop FileSystem Java API.
-
-HttpFS has built-in security supporting Hadoop pseudo authentication and HTTP
SPNEGO Kerberos and other pluggable authentication mechanisms. It also provides
Hadoop proxy user support.
-
-How Does HttpFS Works?
-----------------------
-
-HttpFS is a separate service from Hadoop NameNode.
-
-HttpFS itself is Java Jetty web-application.
-
-HttpFS HTTP web-service API calls are HTTP REST calls that map to a HDFS file
system operation. For example, using the `curl` Unix command:
-
-* `$ curl
'http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt?op=OPEN&user.name=foo'`
returns the contents of the HDFS `/user/foo/README.txt` file.
-
-* `$ curl
'http://httpfs-host:14000/webhdfs/v1/user/foo?op=LISTSTATUS&user.name=foo'`
returns the contents of the HDFS `/user/foo` directory in JSON format.
-
-* `$ curl
'http://httpfs-host:14000/webhdfs/v1/user/foo?op=GETTRASHROOT&user.name=foo'`
returns the path `/user/foo/.Trash`, if `/` is an encrypted zone, returns the
path `/.Trash/foo`. See [more
details](../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Rename_and_Trash_considerations)
about trash path in an encrypted zone.
-
-* `$ curl -X POST
'http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=MKDIRS&user.name=foo'`
creates the HDFS `/user/foo/bar` directory.
-
-User and Developer Documentation
---------------------------------
-
-* [HttpFS Server Setup](./ServerSetup.html)
-
-* [Using HTTP Tools](./UsingHttpTools.html)
-
-
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]