errose28 commented on code in PR #7529: URL: https://github.com/apache/ozone/pull/7529#discussion_r1925723835
########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. Review Comment: ```suggestion By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the Datanode reads that file off of the disk and sends the data to the client over a GRPC connection. ``` ########## hadoop-hdds/docs/content/design/short-circuit-read.md: ########## @@ -0,0 +1,29 @@ +--- +title: Short Circuit Local Read in DN +summary: Support read data from local disk file directly when the client and data are co-located on the same server +date: 2024-12-04 +jira: HDDS-10685 +status: implemented + +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Abstract + +This “short-circuit” local read feature bypassing the Datanode, allows the client to read the file from local disk directly when the client is co-located with the data on the same server. Review Comment: ```suggestion The short-circuit local read feature bypasses the Datanode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. ``` ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. Review Comment: ```suggestion Short-circuit local read will bypass the Datanode, allowing the client to read the file from the local disk directly when the client is co-located with the data on the same server. ``` ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. + +## Prerequisite + +Short-circuit local reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the client and the DataNodes to communicate. + +The Hadoop native library `libhadoop.so` provides support to for Unix domain sockets. Please refer to Hadoop's [Native Libraries Guide](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html) for details. + +The Hadoop version used in Ozone is defined by `hadoop.version` in pom.xml. Before enabling short-circuit local reads, find the `libhadoop.so` from the corresponding version Hadoop release package, put it under one of the directories specified by Java `java.library.path` property. The default value of `java.library.path` depends on the OS and Java version. For example, on Linux with OpenJDK 8 it is `/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib`. Review Comment: Does `hadoop.version` have a default value, or are users required to populate it? ```suggestion The Hadoop version used in Ozone is defined by `hadoop.version` in pom.xml. Before enabling short-circuit local reads, find the `libhadoop.so` from the release package of the corresponding Hadoop version, put it under one of the directories specified by Java's `java.library.path` property. The default value of `java.library.path` depends on the OS and Java version. For example, on Linux with OpenJDK 8 it is `/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib`. ``` ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. + +## Prerequisite + +Short-circuit local reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the client and the DataNodes to communicate. + +The Hadoop native library `libhadoop.so` provides support to for Unix domain sockets. Please refer to Hadoop's [Native Libraries Guide](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html) for details. + +The Hadoop version used in Ozone is defined by `hadoop.version` in pom.xml. Before enabling short-circuit local reads, find the `libhadoop.so` from the corresponding version Hadoop release package, put it under one of the directories specified by Java `java.library.path` property. The default value of `java.library.path` depends on the OS and Java version. For example, on Linux with OpenJDK 8 it is `/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib`. + +The `ozone checknative` command can be used to detect whether `libhadoop.so` can be found and loaded successfully by Ozone service. + + +## Configuration + +Short-circuit local reads need to be configured on both the DataNode and the client. By default, it is disabled. + +```XML +<property> + <name>ozone.client.read.short-circuit</name> + <value>false</value> + <description>Disable or enable the short-circuit local read feature.</description> +</property> +``` + +It makes use of a UNIX domain socket, a special path in the filesystem. You will need to set a path to this socket. + +```XML +<property> + <name>ozone.domain.socket.path</name> + <value></value> + <description> This is a path to a UNIX domain socket that will be used for + communication between the DataNode and local Ozone clients. + If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode. + </description> +</property> +``` + +The DataNode needs to be able to create this path. On the other hand, it should not be possible for any user except the Ozone user(user who launches Ozone service) or root to create this path. For this reason, paths under `/var/run` or `/var/lib` are often used. Review Comment: ```suggestion The Datanode needs to be able to create this path. On the other hand, it should not be possible for any user except the user who launches Ozone service or root to create this path. For this reason, paths under `/var/run` or `/var/lib` are often used. ``` ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. + +## Prerequisite Review Comment: Let's combine this all under the configuration section and make it a step-by-step list, which will be easier for users to follow. ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,76 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +Current in Ozone, client reads data over GRPC from Datanode. When the client asks the DataNode to read a file, the DataNode reads that file off of the disk and sends the data to the client over GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. + +## Prerequisite + +Short-circuit local reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the client and the DataNodes to communicate. + +The Hadoop native library "libhadoop.so" provides the support to use the Unix domain socket. Please refer to Native Libraries ("https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html") for details of this library. + +Before enabling short-circuit local reads, you must have a proper libhadoop.so, and make sure it's under the directory where Java can find and load it through "System.loadLibrary()" call. + +The paths that Java will search for libraries are specified by the "java.library.path" property. The default value of "java.library.path" depends on the OS and Java version. For example, on Linux with OpenJDK 8 it is `/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib`. + +Command "ozone checknative" can be used to detect whether libhadoop.so can be loaded successfully by Ozone service. + +## Configuration + +Short-circuit local reads need to be configured on both the DataNode and the client. By default, it is disabled. Review Comment: Shipping with it disabled in Ozone is fine, my comment here is that by copy/pasting the config snippet provided in the doc, the feature should work. This is currently not the case because the doc has it disabled. We should combine both config examples here into one block and provide a default path. The [Hadoop docs](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html) do this under the Example Configuration section. ########## hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java: ########## @@ -39,6 +39,60 @@ public class OzoneClientConfig { private static final Logger LOG = LoggerFactory.getLogger(OzoneClientConfig.class); + public static final boolean OZONE_READ_SHORT_CIRCUIT_DEFAULT = false; Review Comment: Are these code changes supposed to be here? Let's keep the code and docs PRs separate ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. Review Comment: Is short circuit supposed to have a dash? We should standardize this between the title of the page and its content. Also I don't think it should be quoted. ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. Review Comment: ```suggestion Short-circuit local read can provide a substantial performance boost to many applications by removing the overhead of network communication. ``` ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. Review Comment: Similar comments throughout the doc. Datanode should not be pascal case. ########## hadoop-hdds/docs/content/feature/Short-Circuit-Read.md: ########## @@ -0,0 +1,78 @@ +--- +title: "Short Circuit Local Read in Datanode" +weight: 2 +menu: + main: + parent: Features +summary: Introduction to Ozone Datanode Short Circuit Local Read Feature +--- +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +By default, client reads data over GRPC from the Datanode. When the client asks the Datanode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a GRPC connection. + +This “short-circuit” local read feature will bypass the DataNode, allowing the client to read the file from local disk directly when the client is co-located with the data on the same server. + +Short-circuit local read can provide a substantial performance boost to many applications, by removing the overhead of network communication. + +## Prerequisite + +Short-circuit local reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the client and the DataNodes to communicate. + +The Hadoop native library `libhadoop.so` provides support to for Unix domain sockets. Please refer to Hadoop's [Native Libraries Guide](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html) for details. + +The Hadoop version used in Ozone is defined by `hadoop.version` in pom.xml. Before enabling short-circuit local reads, find the `libhadoop.so` from the corresponding version Hadoop release package, put it under one of the directories specified by Java `java.library.path` property. The default value of `java.library.path` depends on the OS and Java version. For example, on Linux with OpenJDK 8 it is `/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib`. + +The `ozone checknative` command can be used to detect whether `libhadoop.so` can be found and loaded successfully by Ozone service. + + +## Configuration + +Short-circuit local reads need to be configured on both the DataNode and the client. By default, it is disabled. + +```XML +<property> + <name>ozone.client.read.short-circuit</name> + <value>false</value> + <description>Disable or enable the short-circuit local read feature.</description> +</property> +``` + +It makes use of a UNIX domain socket, a special path in the filesystem. You will need to set a path to this socket. + +```XML +<property> + <name>ozone.domain.socket.path</name> + <value></value> + <description> This is a path to a UNIX domain socket that will be used for + communication between the DataNode and local Ozone clients. + If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode. + </description> +</property> +``` + +The DataNode needs to be able to create this path. On the other hand, it should not be possible for any user except the Ozone user(user who launches Ozone service) or root to create this path. For this reason, paths under `/var/run` or `/var/lib` are often used. + +If you configure the `ozone.domain.socket.path` to some value, for example `/dir1/dir2/ozone_dn_socket`, please make sure that both `dir1` and `dir2` are existing directories, but the file `ozone_dn_socket` does not exist under `dir2`. `ozone_dn_socket` will be created by Ozone Datanode later during its startup. + +### Security Consideration + +To ensure data security and integrity, Ozone will follow the same rules as Hadoop to check permission on the `ozone.domain.socket.path` path as documented in [Socket Path Security](https://wiki.apache.org/hadoop/SocketPathSecurity). +It will fail the `ozone.domain.socket.path` verification and disable the feature if the filesystem permissions of the specified path are inadequate. +The verification failure message carries detail instruction about how to fix the problem. Following is an example, Review Comment: These should be soft wrapped. The hard wrapping is not preserved in the rendered markdown and it makes the text harder to edit later. For easier editing set your markdown editor to use soft wraps. ```suggestion To ensure data security and integrity, Ozone will follow the same rules as Hadoop to check permissions on the `ozone.domain.socket.path` path as documented in [Socket Path Security](https://wiki.apache.org/hadoop/SocketPathSecurity). It will fail the `ozone.domain.socket.path` verification and disable the feature if the filesystem permissions of the specified path are inadequate. The verification failure message carries detailed instructions about how to fix the problem. Following is an example: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org