steveburnett commented on code in PR #10793: URL: https://github.com/apache/incubator-gluten/pull/10793#discussion_r2382855751
########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: Review Comment: ```suggestion Add the following configs in `spark-defaults.conf`: ``` Suggest a blank line before, to left-justify the instruction to the reader. This makes it more visible to the reader and reduces the chance the reader will not see it. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above Review Comment: ```suggestion ### Maven Gluten requires Maven 3.6.3 or above. ``` I see some inconsistency between the heading "JDK" followed by text explaining the requirements, and the headings "Maven 3.6.3 or above" and "GCC 11 or above". I actually looked for and was surprised by the lack of text below the heading until I realized the information was in the heading. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above -# Compile Gluten using debug mode +## Development -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: -[X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) +- If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: + [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server -- Start Idea, `bash <idea_dir>/idea.sh` +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. +- Start Idea using the following command: + `bash <idea_dir>/idea.sh` Review Comment: ```suggestion `bash <idea_dir>/idea.sh` ``` I meant in my previous comment on this line to have the command on its own line, indented under the unordered list item in line 63. By putting the command on its own line, the reader can more easily select the command to copy and paste it to their command prompt. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above Review Comment: ```suggestion ### GCC Gluten requires GDC 11 or above. ``` ########## docs/developers/NewToGluten.md: ########## @@ -329,11 +205,11 @@ After the above installation, you can optionally do some configuration in Visual * Set Args: `--first-comment-is-literal=True`. Review Comment: ```suggestion * Set **Args**: `--first-comment-is-literal=True`. ``` Note: I am unable to comment on lines 196-204. Please make the following changes: 1. Edit line 196 to: "Here is an example of how to format a file using the command line:" 2. In line 203, make "File", "Preferences", and "Settings" **bold**. 3. In line 204, change "and do the below settings:" to "and set the following settings as shown:" Thanks! ########## docs/developers/NewToGluten.md: ########## @@ -470,24 +343,23 @@ child allocators: 0 at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` -## CPP code memory leak +### CPP code memory leak -Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect +Sometimes you cannot get the coredump symbols, when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. Review Comment: ```suggestion Sometimes you cannot get the coredump symbols when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. ``` ########## docs/developers/NewToGluten.md: ########## @@ -470,24 +343,23 @@ child allocators: 0 at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` -## CPP code memory leak +### CPP code memory leak -Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect +Sometimes you cannot get the coredump symbols, when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. ```bash apt install valgrind valgrind --leak-check=yes ./exec_backend_test ``` - -# Run TPC-H and TPC-DS +## Run TPC-H and TPC-DS We supply `<gluten_home>/tools/gluten-it` to execute these queries -Refer to [velox_backend.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend.yml) +Refer to [velox_backend_x86.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend_x86.yml) -# Run Gluten+Velox on clean machine +## Enable Gluten for Spark -We can run Gluten + Velox on clean machine by one command (supported OS: Ubuntu20.04/22.04, CentOS 7/8, etc.). +To enable Gluten Velox backend for Spark, use the following command: Review Comment: ```suggestion To enable Gluten Velox backend for Spark, run the following command: ``` ########## docs/developers/NewToGluten.md: ########## @@ -470,24 +343,23 @@ child allocators: 0 at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` -## CPP code memory leak +### CPP code memory leak -Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect +Sometimes you cannot get the coredump symbols, when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. ```bash apt install valgrind valgrind --leak-check=yes ./exec_backend_test ``` - -# Run TPC-H and TPC-DS +## Run TPC-H and TPC-DS We supply `<gluten_home>/tools/gluten-it` to execute these queries -Refer to [velox_backend.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend.yml) +Refer to [velox_backend_x86.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend_x86.yml) Review Comment: ```suggestion See [velox_backend_x86.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend_x86.yml). ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above -# Compile Gluten using debug mode +## Development -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: -[X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) +- If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: + [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server -- Start Idea, `bash <idea_dir>/idea.sh` +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. +- Start Idea using the following command: + `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. -- Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Load the Gluten by **File**->**Open**, select **<gluten_home/pom.xml>**. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: Review Comment: ```suggestion To format Java/Scala code using the [Spotless](https://github.com/diffplug/spotless) plugin, run the following command: ``` Please check that the link I have suggested is the correct one to include here. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. Review Comment: ```suggestion ``` Reading the page in View File in GitHub, I think lines 21-23 should be part of a paragraph with line 19 "Note: Starting with Spark 4.0, the minimum required JDK version is 17." as the first sentence of the paragraph. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above -# Compile Gluten using debug mode +## Development -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: -[X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) +- If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: + [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server -- Start Idea, `bash <idea_dir>/idea.sh` +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. +- Start Idea using the following command: + `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. -- Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Load the Gluten by **File**->**Open**, select **<gluten_home/pom.xml>**. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests -``` -For Clickhouse backend: ``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh USERNAME@REMOTE_SERVER_IP_ADDRESS`) Review Comment: ```suggestion - Remote Explorer (To connect to the linux server using ssh, click **+**, then enter `ssh USERNAME@REMOTE_SERVER_IP_ADDRESS`) ``` ########## docs/developers/NewToGluten.md: ########## @@ -183,115 +149,25 @@ configurations below: } ``` -After compiling with these updated configs, you should have executable files (such as -`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`). +After compiling with these updated configs, you should have executable files, such as +`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`. -Open the `Run and Debug` panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, -select a debugger like "C++ (GDB/LLDB)". The launch.json will be created at: `<gluten_home>/.vscode/launch.json`. +Open the **Run and Debug** panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, Review Comment: ```suggestion Open the **Run and Debug** panel (Ctrl-Shift-D) and then click the link to create a `launch.json` file. If prompted, ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above -# Compile Gluten using debug mode +## Development -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: -[X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) +- If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: + [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server -- Start Idea, `bash <idea_dir>/idea.sh` +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. +- Start Idea using the following command: + `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. -- Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Load the Gluten by **File**->**Open**, select **<gluten_home/pom.xml>**. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests -``` -For Clickhouse backend: ``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh USERNAME@REMOTE_SERVER_IP_ADDRESS`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. Review Comment: ```suggestion Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in the folder `~/.vscode-server` on the remote machine. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten Review Comment: ```suggestion Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten ``` ########## docs/developers/NewToGluten.md: ########## @@ -183,115 +149,25 @@ configurations below: } ``` -After compiling with these updated configs, you should have executable files (such as -`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`). +After compiling with these updated configs, you should have executable files, such as +`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`. -Open the `Run and Debug` panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, -select a debugger like "C++ (GDB/LLDB)". The launch.json will be created at: `<gluten_home>/.vscode/launch.json`. +Open the **Run and Debug** panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, +select a debugger like "C++ (GDB/LLDB)". The `launch.json` will be created under `<gluten_home>/.vscode/` (see example [here](../resources/launch.json)). Review Comment: ```suggestion select a debugger like **C++ (GDB/LLDB)**. `launch.json` will be created under `<gluten_home>/.vscode/`. An example `launch.json` file is provided in [launch.json](../resources/launch.json)). ``` Revised the link. Good practice in documentation is to include the title or visible words of the destination in the link, so the reader sees "Tomato" in the link, clicks on it, sees "Tomato" on the new page they just opened, and they know they are where they should be. Almost no page on the Internet has the title "Here" :), so try to use the same words in the link the reader will see when they click and open the new page. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,123 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +Add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above - -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### Maven 3.6.3 or above -## GCC 11 or above +### GCC 11 or above -# Compile Gluten using debug mode +## Development -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: -[X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) +- If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: + [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server -- Start Idea, `bash <idea_dir>/idea.sh` +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. +- Start Idea using the following command: + `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. -- Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Load the Gluten by **File**->**Open**, select **<gluten_home/pom.xml>**. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests -``` -For Clickhouse backend: ``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh USERNAME@REMOTE_SERVER_IP_ADDRESS`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. + +If the download fails, delete this folder and try again. -## Usage +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in **Manage**->**Settings** to turn off update mode. -### Set up project +#### Set up project -- File->Open Folder // select the Gluten folder -- After the project loads, you will be prompted to "Select CMakeLists.txt". Select the +- Select **File**->**Open Folder**, then select the Gluten folder. +- After the project loads, you will be prompted to **Select CMakeLists.txt**. Select the `${workspaceFolder}/cpp/CMakeLists.txt` file. -- Next, you will be prompted to "Select a Kit" for the Gluten project. Select GCC 11 or above. +- Next, you will be prompted to **Select a Kit** for the Gluten project. Select **GCC 11** or above. -### Settings +#### Settings -VSCode supports 2 ways to set user setting. +VSCode supports two ways to set user setting. Review Comment: ```suggestion VSCode supports two ways to set user settings. ``` ########## docs/developers/NewToGluten.md: ########## @@ -183,115 +149,25 @@ configurations below: } ``` -After compiling with these updated configs, you should have executable files (such as -`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`). +After compiling with these updated configs, you should have executable files, such as +`<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`. -Open the `Run and Debug` panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, -select a debugger like "C++ (GDB/LLDB)". The launch.json will be created at: `<gluten_home>/.vscode/launch.json`. +Open the **Run and Debug** panel (Ctrl-Shift-D) and then click the link to create a launch.json file. If prompted, +select a debugger like "C++ (GDB/LLDB)". The `launch.json` will be created under `<gluten_home>/.vscode/` (see example [here](../resources/launch.json)). -Click the `Add Configuration` button in launch.json, and select gdb "launch" (to start and debug a program) or -"attach" (to attach and debug a running program). +Note: Change `name`, `program`, `args` for your environment. -#### launch.json example +Click the **Add Configuration** button in `launch.json`, and select gdb **launch** to start a program for debugging or +**attach** to attach a running program for debugging. -```json -{ - // Use IntelliSense to learn about possible attributes. - // Hover to view descriptions of existing attributes. - // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 - "version": "0.2.0", - "configurations": [ - { - "name": "velox shuffle writer test", - "type": "cppdbg", - "request": "launch", - "program": "${workspaceFolder}/cpp/build/velox/tests/velox_shuffle_writer_test", - "args": ["--gtest_filter='*SinglePartitioningShuffleWriter*'"], - "stopAtEntry": false, - "cwd": "${fileDirname}", - "environment": [], - "externalConsole": false, - "MIMode": "gdb", - "setupCommands": [ - { - "description": "Enable pretty-printing for gdb", - "text": "-enable-pretty-printing", - "ignoreFailures": true - }, - { - "description": "Set Disassembly Flavor to Intel", - "text": "-gdb-set disassembly-flavor intel", - "ignoreFailures": true - } - ] - }, - { - "name": "benchmark test", - "type": "cppdbg", - "request": "launch", - "program": "${workspaceFolder}/cpp/build/velox/benchmarks/./generic_benchmark", - "args": [ - "--threads=1", - "--with-shuffle", - "--partitioning=hash", - "--iterations=1", - "--conf=${workspaceFolder}/backends-velox/generated-native-benchmark/conf_12_0_2.ini", - "--plan=${workspaceFolder}/backends-velox/generated-native-benchmark/plan_12_0_2.json", - "--data=${workspaceFolder}/backends-velox/generated-native-benchmark/data_12_0_2_0.parquet,${workspaceFolder}/backends-velox/generated-native-benchmark/data_12_0_2_1.parquet" - ], - "stopAtEntry": false, - "cwd": "${fileDirname}", - "environment": [], - "externalConsole": false, - "MIMode": "gdb", - "setupCommands": [ - { - "description": "Enable pretty-printing for gdb", - "text": "-enable-pretty-printing", - "ignoreFailures": true - }, - { - "description": "Set Disassembly Flavor to Intel", - "text": "-gdb-set disassembly-flavor intel", - "ignoreFailures": true - } - ] - } - - ] -} -``` - -> Change `name`, `program`, `args` for your environment. For example, your generated benchmark example file names may vary. - -Then you can create breakpoint and debug in `Run and Debug` section. - -### Velox debug - -For some Velox tests such as `ParquetReaderTest`, tests need to read the parquet file in `<velox_home>/velox/dwio/parquet/tests/examples`, -you should let the screen on `ParquetReaderTest.cpp`, then click `Start Debugging`, otherwise `No such file or directory` exception will be raised. - -## Useful notes +Then you can create breakpoint and debug in **Run and Debug** section. Review Comment: ```suggestion Then you can create breakpoints and debug using **Run and Debug** in Visual Studio Code. ``` ########## docs/developers/NewToGluten.md: ########## @@ -345,26 +221,24 @@ After the above installation, you can optionally do some configuration in Visual 4. Placement of Non-Native Code UTs: Ensure that unit tests for non-native code are placed within org.apache.gluten and org.apache.spark packages. This is important because the CI system runs unit tests from these two paths in parallel. Placing tests in other paths might cause your tests to be ignored. -### View surefire reports of Velox ut in GHA +#### View Surefire reports of Velox unit tests in GHA Surefire reports are invaluable tools in the ecosystem of Java-based applications that utilize the Maven build automation tool. These reports are generated by the Maven Surefire Plugin during the testing phase of your build process. They compile results from unit tests, providing detailed insights into which tests passed or failed, what errors were encountered, and other essential metrics. Surefire reports play a crucial role in the development and maintenance of high-quality software. -We provide surefire reports of Velox ut in GHA, and developers can leverage surefire reports with early bug detection and quality assurance. - -You can check surefire reports: +We provide surefire reports of Velox ut in GHA, and developers can leverage urefire reports with early bug detection and quality assurance. Review Comment: ```suggestion We provide Surefire reports of Velox unit tests in GHA so that developers can leverage Surefire reports with early bug detection and quality assurance. ``` ########## docs/developers/NewToGluten.md: ########## @@ -329,11 +205,11 @@ After the above installation, you can optionally do some configuration in Visual * Set Args: `--first-comment-is-literal=True`. * Set Exe Path to the path of the `cmake-format` command. If you installed `cmake-format` in a standard Review Comment: ```suggestion * Set **Exe Path** to the path of the `cmake-format` command. If you installed `cmake-format` in a standard ``` ########## docs/developers/NewToGluten.md: ########## @@ -345,26 +221,24 @@ After the above installation, you can optionally do some configuration in Visual 4. Placement of Non-Native Code UTs: Ensure that unit tests for non-native code are placed within org.apache.gluten and org.apache.spark packages. This is important because the CI system runs unit tests from these two paths in parallel. Placing tests in other paths might cause your tests to be ignored. -### View surefire reports of Velox ut in GHA +#### View Surefire reports of Velox unit tests in GHA Surefire reports are invaluable tools in the ecosystem of Java-based applications that utilize the Maven build automation tool. These reports are generated by the Maven Surefire Plugin during the testing phase of your build process. They compile results from unit tests, providing detailed insights into which tests passed or failed, what errors were encountered, and other essential metrics. Surefire reports play a crucial role in the development and maintenance of high-quality software. -We provide surefire reports of Velox ut in GHA, and developers can leverage surefire reports with early bug detection and quality assurance. - -You can check surefire reports: +We provide surefire reports of Velox ut in GHA, and developers can leverage urefire reports with early bug detection and quality assurance. -1. Click `Checks` Tab in PR; +To check Surefire reports: -2. Find `Report test results` in `Dev PR`; - -3. Then, developers can check the result with summary and annotations. +1. Click **Checks** Tab in PR; Review Comment: ```suggestion 1. Click the **Checks** Tab in PR. ``` ########## docs/developers/NewToGluten.md: ########## @@ -306,14 +182,14 @@ Set config in `settings.json` "editor.formatOnSave": true, ``` -If exists multiple clang-format version, formatOnSave may not take effect, specify the default formatter -Search `default formatter` in `Settings`, select Clang-Format. +If multiple clang-format versions are installed, `formatOnSave` may not take effect. To specify the default formatter, +search for `default formatter` in **Settings**, then select **Clang-Format**. -If your formatOnSave still make no effect, you can use shortcut `SHIFT+ALT+F` to format one file manually. +If `formatOnSave` still has no effect, select a single file and use `SHIFT+ALT+F` to format it manually. -### CMake format +#### CMake format -To format cmake files, like CMakeLists.txt & *.cmake, please install `cmake-format`. +To format cmake files like CMakeLists.txt & *.cmake, install `cmake-format`. Review Comment: ```suggestion To format cmake files like `CMakeLists.txt` and `*.cmake`, install `cmake-format`. ``` ########## docs/developers/NewToGluten.md: ########## @@ -470,24 +343,23 @@ child allocators: 0 at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` -## CPP code memory leak +### CPP code memory leak -Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect +Sometimes you cannot get the coredump symbols, when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. ```bash apt install valgrind valgrind --leak-check=yes ./exec_backend_test ``` - -# Run TPC-H and TPC-DS +## Run TPC-H and TPC-DS We supply `<gluten_home>/tools/gluten-it` to execute these queries Review Comment: ```suggestion We supply `<gluten_home>/tools/gluten-it` to execute these queries. ``` ########## docs/developers/NewToGluten.md: ########## @@ -345,26 +221,24 @@ After the above installation, you can optionally do some configuration in Visual 4. Placement of Non-Native Code UTs: Ensure that unit tests for non-native code are placed within org.apache.gluten and org.apache.spark packages. This is important because the CI system runs unit tests from these two paths in parallel. Placing tests in other paths might cause your tests to be ignored. -### View surefire reports of Velox ut in GHA +#### View Surefire reports of Velox unit tests in GHA Surefire reports are invaluable tools in the ecosystem of Java-based applications that utilize the Maven build automation tool. These reports are generated by the Maven Surefire Plugin during the testing phase of your build process. They compile results from unit tests, providing detailed insights into which tests passed or failed, what errors were encountered, and other essential metrics. Surefire reports play a crucial role in the development and maintenance of high-quality software. -We provide surefire reports of Velox ut in GHA, and developers can leverage surefire reports with early bug detection and quality assurance. - -You can check surefire reports: +We provide surefire reports of Velox ut in GHA, and developers can leverage urefire reports with early bug detection and quality assurance. -1. Click `Checks` Tab in PR; +To check Surefire reports: -2. Find `Report test results` in `Dev PR`; - -3. Then, developers can check the result with summary and annotations. +1. Click **Checks** Tab in PR; +2. Find **Report test results** in **Dev PR**; +3. There, you can check the results with summary and annotations.  -# Debug cpp code with coredump +## Debug C++ Code with Core Dump ```bash mkdir -p /mnt/DP_disk1/core Review Comment: I cannot comment on lines 245-276. 1. Please fix the formatting in line 254: 'core-Executor task l-2000883-1671542526' should be `core-Executor task l-2000883-1671542526` 2. Please edit line 268 to read as follows: "Sometimes you only get the C++ exception message. If that happens, you can generate a core dump file by running the following code:" ########## docs/developers/NewToGluten.md: ########## @@ -432,11 +305,11 @@ wait to attach.... (gdb) c ``` -# Debug Memory leak +## Debug Memory Leaks -## Arrow memory allocator leak +### Arrow memory allocator leak -If you receive error message like +If you receive an error message like the following: ```bash 4/04/18 08:15:38 WARN ArrowBufferAllocators$ArrowBufferAllocatorManager: Detected leaked Arrow allocator [Default], size: 191, process accumulated leaked size: 191... Review Comment: Please edit line 318 to the following: "You can open the Arrow allocator debug config by adding the VP option `-Darrow.memory.debug.allocator=true`. That gives you more details, like the following example:" ########## docs/developers/NewToGluten.md: ########## @@ -345,26 +221,24 @@ After the above installation, you can optionally do some configuration in Visual 4. Placement of Non-Native Code UTs: Ensure that unit tests for non-native code are placed within org.apache.gluten and org.apache.spark packages. This is important because the CI system runs unit tests from these two paths in parallel. Placing tests in other paths might cause your tests to be ignored. -### View surefire reports of Velox ut in GHA +#### View Surefire reports of Velox unit tests in GHA Surefire reports are invaluable tools in the ecosystem of Java-based applications that utilize the Maven build automation tool. These reports are generated by the Maven Surefire Plugin during the testing phase of your build process. They compile results from unit tests, providing detailed insights into which tests passed or failed, what errors were encountered, and other essential metrics. Surefire reports play a crucial role in the development and maintenance of high-quality software. -We provide surefire reports of Velox ut in GHA, and developers can leverage surefire reports with early bug detection and quality assurance. - -You can check surefire reports: +We provide surefire reports of Velox ut in GHA, and developers can leverage urefire reports with early bug detection and quality assurance. -1. Click `Checks` Tab in PR; +To check Surefire reports: -2. Find `Report test results` in `Dev PR`; - -3. Then, developers can check the result with summary and annotations. +1. Click **Checks** Tab in PR; +2. Find **Report test results** in **Dev PR**; Review Comment: ```suggestion 2. Find **Report test results** in **Dev PR**. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
