(incubator-gluten) branch main updated: [VL] Refactor Velox.md (#8478)

felixybw Wed, 08 Jan 2025 19:45:43 -0800

This is an automated email from the ASF dual-hosted git repository.

felixybw pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new add254245b [VL] Refactor Velox.md (#8478)
add254245b is described below

commit add254245b71ffd852cf1371dc1efb5f3a163898
Author: BInwei Yang <[email protected]>
AuthorDate: Wed Jan 8 19:45:27 2025 -0800

    [VL] Refactor Velox.md (#8478)
    
    Some refactor of velox.md
---
 docs/get-started/Velox.md         | 371 ++------------------------------------
 docs/get-started/VeloxGlutenUI.md | 116 ++++++++++++
 docs/get-started/VeloxHBM.md      |  30 +++
 docs/get-started/VeloxIAA.md      |  75 ++++++++
 docs/get-started/VeloxQAT.md      | 155 ++++++++++++++++
 5 files changed, 393 insertions(+), 354 deletions(-)

diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index 863e996796..81bb88c75a 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -120,6 +120,8 @@ pack required shared libraries into another jar named 
`gluten-thirdparty-lib-$LI
 set `spark.gluten.loadLibFromJar=true`. Otherwise, you need to install 
required shared libraries with ** exactly the same versions ** on each worker 
node . You may find the 
 libraries list from the third-party jar.
 
+# Remote storage support
+
 ## HDFS support
 
 Gluten supports dynamically loading both libhdfs.so and libhdfs3.so at runtime 
by using dlopen, allowing the JVM to load the appropriate shared library file 
as needed. This means you do not need to set the library path during the 
compilation phase.
@@ -221,6 +223,12 @@ cd /path/to/gluten
 
 Currently there are several ways to access S3 in Spark. Please refer [Velox 
S3](VeloxS3.md) part for more detailed configurations
 
+## GCS support
+
+Please refer [GCS](VeloxGCS.md)
+
+# Remote Shuffle Service Support
+
 ## Celeborn support
 
 Gluten with velox backend supports 
[Celeborn](https://github.com/apache/celeborn) as remote shuffle service. 
Currently, the supported Celeborn versions are `0.3.x`, `0.4.x` and `0.5.x`.
@@ -310,6 +318,8 @@ spark.rss.storage.type LOCALFILE_HDFS
 spark.dynamicAllocation.enabled false
 ```
 
+# Datalake Framework Support
+
 ## DeltaLake Support
 
 Gluten with velox backend supports [DeltaLake](https://delta.io/) table.
@@ -395,7 +405,7 @@ native validation failed due to: in ProjectRel, Scalar 
function name not registe
 In the above, the symbol `^` indicates a plan is offloaded to Velox in a 
stage. In Spark DAG, all such pipelined plans (consecutive plans marked with 
`^`) are plotted
 inside an umbrella node named `WholeStageCodegenTransformer` (It's not codegen 
node. The naming is just for making it well plotted like Spark Whole Stage 
Codegen).
 
-# Spill (Experimental)
+# Spill
 
 Velox backend supports spilling-to-disk.
 
@@ -419,251 +429,9 @@ Using the following configuration options to customize 
spilling:
 
 Please check the [VeloxNativeUDF.md](../developers/VeloxNativeUDF.md) for more 
detailed usage and configurations.
 
-# High-Bandwidth Memory (HBM) support
-
-Gluten supports allocating memory on HBM. This feature is optional and is 
disabled by default. It is implemented on top of [Memkind 
library](http://memkind.github.io/memkind/). You can refer to memkind's 
[readme](https://github.com/memkind/memkind#memkind) for more details.
-
-## Build Gluten with HBM
-
-Gluten will internally build and link to a specific version of Memkind library 
and [hwloc](https://github.com/open-mpi/hwloc). Other dependencies should be 
installed on Driver and Worker node first:
-
-```bash
-sudo apt install -y autoconf automake g++ libnuma-dev libtool numactl unzip 
libdaxctl-dev
-```
-
-After the set-up, you can now build Gluten with HBM. Below command is used to 
enable this feature
-
-```bash
-cd /path/to/gluten
-
-## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
-./dev/buildbundle-veloxbe.sh --enable_hbm=ON
-```
-
-## Configure and enable HBM in Spark Application
-
-At runtime, `MEMKIND_HBW_NODES` enviroment variable is detected for 
configuring HBM NUMA nodes. For the explaination to this variable, please refer 
to memkind's manual page. This can be set for all executors through spark conf, 
e.g. `--conf spark.executorEnv.MEMKIND_HBW_NODES=8-15`. Note that memory 
allocation fallback is also supported and cannot be turned off. If HBM is 
unavailable or fills up, the allocator will use default(DDR) memory.
-
-# Intel® QuickAssist Technology (QAT) support
-
-Gluten supports using Intel® QuickAssist Technology (QAT) for data compression 
during Spark Shuffle. It benefits from QAT Hardware-based acceleration on 
compression/decompression, and uses Gzip as compression format for higher 
compression ratio to reduce the pressure on disks and network transmission.
-
-This feature is based on QAT driver library and 
[QATzip](https://github.com/intel/QATzip) library. Please manually download QAT 
driver for your system, and follow its README to build and install on all 
Driver and Worker node: [Intel® QuickAssist Technology Driver for Linux* – HW 
Version 
2.0](https://www.intel.com/content/www/us/en/download/765501/intel-quickassist-technology-driver-for-linux-hw-version-2-0.html?wapkw=quickassist).
-
-## Software Requirements
-
-- Download QAT driver for your system, and follow its README to build and 
install on all Driver and Worker nodes: [Intel® QuickAssist Technology Driver 
for Linux* – HW Version 
2.0](https://www.intel.com/content/www/us/en/download/765501/intel-quickassist-technology-driver-for-linux-hw-version-2-0.html?wapkw=quickassist).
-- Below compression libraries need to be installed on all Driver and Worker 
nodes:
-  - Zlib* library of version 1.2.7 or higher
-  - ZSTD* library of version 1.5.4 or higher
-  - LZ4* library
-
-## Build Gluten with QAT
-
-1. Setup ICP_ROOT environment variable to the directory where QAT driver is 
extracted. This environment variable is required during building Gluten and 
running Spark applications. It's recommended to put it in .bashrc on Driver and 
Worker nodes.
-
-```bash
-echo "export ICP_ROOT=/path/to/QAT_driver" >> ~/.bashrc
-source ~/.bashrc
-
-# Also set for root if running as non-root user
-sudo su - 
-echo "export ICP_ROOT=/path/to/QAT_driver" >> ~/.bashrc
-exit
-```
-
-2. **This step is required if your application is running as Non-root user.**
-   The users must be added to the 'qat' group after QAT drvier is installed. 
And change the amount of max locked memory for the username that is included in 
the group name. This can be done by specifying the limit in 
/etc/security/limits.conf.
-
-```bash
-sudo su -
-usermod -aG qat username # need relogin to take effect
-
-# To set 500MB add a line like this in /etc/security/limits.conf
-echo "@qat - memlock 500000" >> /etc/security/limits.conf
-
-exit
-```
-
-3. Enable huge page. This step is required to execute each time after system 
reboot. We recommend using systemctl to manage at system startup. You change 
the values for "max_huge_pages" and "max_huge_pages_per_process" to make sure 
there are enough resources for your workload. As for Spark applications, one 
process matches one executor. Within the executor, every task is allocated a 
maximum of 5 huge pages.
-
-```bash
-sudo su -
-
-cat << EOF > /usr/local/bin/qat_startup.sh
-#!/bin/bash
-echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
-rmmod usdm_drv
-insmod $ICP_ROOT/build/usdm_drv.ko max_huge_pages=1024 
max_huge_pages_per_process=32
-EOF
-
-chmod +x /usr/local/bin/qat_startup.sh
-
-cat << EOF > /etc/systemd/system/qat_startup.service
-[Unit]
-Description=Configure QAT
-
-[Service]
-ExecStart=/usr/local/bin/qat_startup.sh
-
-[Install]
-WantedBy=multi-user.target
-EOF
-
-systemctl enable qat_startup.service
-systemctl start qat_startup.service # setup immediately
-systemctl status qat_startup.service
-
-exit
-```
-
-4. After the setup, you are now ready to build Gluten with QAT. Use the 
command below to enable this feature:
-
-```bash
-cd /path/to/gluten
-
-## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
-./dev/buildbundle-veloxbe.sh --enable_qat=ON
-```
-
-## Enable QAT with Gzip/Zstd for shuffle compression
-
-1. To offload shuffle compression into QAT, first make sure you have the right 
QAT configuration file at /etc/4xxx_devX.conf. We provide a [example 
configuration file](../qat/4x16.conf). This configuration sets up to 4 
processes that can bind to 1 QAT, and each process can use up to 16 QAT DC 
instances.
-
-```bash
-## run as root
-## Overwrite QAT configuration file.
-cd /etc
-for i in {0..7}; do echo "4xxx_dev$i.conf"; done | xargs -i cp -f 
/path/to/gluten/docs/qat/4x16.conf {}
-## Restart QAT after updating configuration files.
-adf_ctl restart
-```
-
-2. Check QAT status and make sure the status is up
-
-```bash
-adf_ctl status
-```
-
-The output should be like:
-
-```
-Checking status of all devices.
-There is 8 QAT acceleration device(s) in the system:
- qat_dev0 - type: 4xxx,  inst_id: 0,  node_id: 0,  bsf: 0000:6b:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev1 - type: 4xxx,  inst_id: 1,  node_id: 1,  bsf: 0000:70:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev2 - type: 4xxx,  inst_id: 2,  node_id: 2,  bsf: 0000:75:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev3 - type: 4xxx,  inst_id: 3,  node_id: 3,  bsf: 0000:7a:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev4 - type: 4xxx,  inst_id: 4,  node_id: 4,  bsf: 0000:e8:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev5 - type: 4xxx,  inst_id: 5,  node_id: 5,  bsf: 0000:ed:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev6 - type: 4xxx,  inst_id: 6,  node_id: 6,  bsf: 0000:f2:00.0,  #accel: 
1 #engines: 9 state: up
- qat_dev7 - type: 4xxx,  inst_id: 7,  node_id: 7,  bsf: 0000:f7:00.0,  #accel: 
1 #engines: 9 state: up
-```
-
-3. Extra Gluten configurations are required when starting Spark application
-
-```
---conf spark.gluten.sql.columnar.shuffle.codec=gzip # Valid options are gzip 
and zstd
---conf spark.gluten.sql.columnar.shuffle.codecBackend=qat
-```
-
-4. You can use below command to check whether QAT is working normally at 
run-time. The value of fw_counters should continue to increase during shuffle.
-
-```
-while :; do cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/fw_counters; sleep 1; 
done
-```
-
-## QAT driver references
-
-**Documentation**
-
-[README Text Files 
(README_QAT20.L.1.0.0-00021.txt)](https://downloadmirror.intel.com/765523/README_QAT20.L.1.0.0-00021.txt)
-
-**Release Notes**
-
-Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/632507/intel-quickassist-technology-intel-qat-software-for-linux-release-notes-hardware-version-2-0.html)
 - Release Notes for the latest changes in this release.
-
-**Getting Started Guide**
-
-Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/632506/intel-quickassist-technology-intel-qat-software-for-linux-getting-started-guide-hardware-version-2-0.html)
 - Getting Started Guide for detailed installation instructions.
-
-**Programmer's Guide**
-
-Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/743912/intel-quickassist-technology-intel-qat-software-for-linux-programmers-guide-hardware-version-2-0.html)
 - Programmer's Guide for software usage guidelines.
-
-For more Intel® QuickAssist Technology resources go to [Intel® QuickAssist 
Technology (Intel® QAT)](https://developer.intel.com/quickassist)
-
-# Intel® In-memory Analytics Accelerator (IAA/IAX) support
-
-Similar to Intel® QAT, Gluten supports using Intel® In-memory Analytics 
Accelerator (IAA, also called IAX) for data compression during Spark Shuffle. 
It benefits from IAA Hardware-based acceleration on compression/decompression, 
and uses Gzip as compression format for higher compression ratio to reduce the 
pressure on disks and network transmission.
-
-This feature is based on Intel® [QPL](https://github.com/intel/qpl).
-
-## Build Gluten with IAA
-
-Gluten will internally build and link to a specific version of QPL library, 
but extra environment setup is still required. Please refer to [QPL 
Installation 
Guide](https://intel.github.io/qpl/documentation/get_started_docs/installation.html)
 to install dependencies and configure accelerators.
-
-**This step is required if your application is running as Non-root user.**
-Create a group for the users who have privilege to use IAA, and grant group 
iaa read/write access to the IAA Work-Queues.
-
-```bash
-sudo groupadd iaa
-sudo usermod -aG iaa username # need to relogin
-sudo chgrp -R iaa /dev/iax
-sudo chmod -R g+rw /dev/iax
-```
-
-After the set-up, you can now build Gluten with QAT. Below command is used to 
enable this feature
-
-```bash
-cd /path/to/gluten
-
-## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
-./dev/buildbundle-veloxbe.sh --enable_iaa=ON
-```
-
-## Enable IAA with Gzip Compression for shuffle compression
-
-1. To enable QAT at run-time, first make sure you have configured the IAA 
Work-Queues correctly, and the file permissions of /dev/iax/wqX.0 are correct.
-
-```bash
-sudo ls -l /dev/iax
-```
-
-The output should be like:
-
-```
-total 0
-crw-rw---- 1 root iaa 509, 0 Apr  5 18:54 wq1.0
-crw-rw---- 1 root iaa 509, 5 Apr  5 18:54 wq11.0
-crw-rw---- 1 root iaa 509, 6 Apr  5 18:54 wq13.0
-crw-rw---- 1 root iaa 509, 7 Apr  5 18:54 wq15.0
-crw-rw---- 1 root iaa 509, 1 Apr  5 18:54 wq3.0
-crw-rw---- 1 root iaa 509, 2 Apr  5 18:54 wq5.0
-crw-rw---- 1 root iaa 509, 3 Apr  5 18:54 wq7.0
-crw-rw---- 1 root iaa 509, 4 Apr  5 18:54 wq9.0
-```
-
-2. Extra Gluten configurations are required when starting Spark application
-
-```
---conf spark.gluten.sql.columnar.shuffle.codec=gzip
---conf spark.gluten.sql.columnar.shuffle.codecBackend=iaa
-```
-
-## IAA references
-
-**Intel® IAA Enabling Guide**
-
-Check out the [Intel® In-Memory Analytics Accelerator (Intel® IAA) Enabling 
Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-iaa-enabling-guide.html)
-
-**Intel® QPL Documentation**
-
-Check out the [Intel® Query Processing Library (Intel® QPL) 
Documentation](https://intel.github.io/qpl/index.html)
-
 # Test TPC-H or TPC-DS on Gluten with Velox backend
 
-All TPC-H and TPC-DS queries are supported in Gluten Velox backend.
+All TPC-H and TPC-DS queries are supported in Gluten Velox backend. You may 
refer to the [notebook](../../tools/workload/benchmark_velox) we used to do the 
performance test.
 
 ## Data preparation
 
@@ -726,118 +494,9 @@ Both Parquet and ORC datasets are sf1024.
 | TPC-H Q6 | 13.6 | 21.6  | 34.9 |
 | TPC-H Q1 | 26.1 | 76.7 | 84.9 |
 
-# External reference setup
-
-TO ease your first-hand experience of using Gluten, we have set up an external 
reference cluster. If you are interested, please contact [email protected].
-
 # Gluten UI
 
-## Gluten event
-
-Gluten provides two events `GlutenBuildInfoEvent` and 
`GlutenPlanFallbackEvent`:
-
-- GlutenBuildInfoEvent, it contains the Gluten build information so that we 
are able to be aware of the environment when doing some debug.
-  It includes `Java Version`, `Scala Version`, `GCC Version`, `Gluten 
Version`, `Spark Version`, `Hadoop Version`, `Gluten Revision`, `Backend`, 
`Backend Revision`, etc.
-
-- GlutenPlanFallbackEvent, it contains the fallback information for each query 
execution.
-  Note, if the query execution is in AQE, then Gluten will post it for each 
stage.
-
-Developers can register `SparkListener` to handle these two Gluten events.
-
-## SQL tab
-
-Gluten provides a tab based on Spark UI, named `Gluten SQL / DataFrame`
-
-![Gluten-UI](../image/gluten-ui.png)
-
-This tab contains two parts:
-
-1. The Gluten build information.
-2. SQL/Dataframe queries fallback information.
-
-If you want to disable Gluten UI, add a config when submitting `--conf 
spark.gluten.ui.enabled=false`.
-
-## History server
-
-Gluten UI also supports Spark history server. Add gluten-ui jar into the 
history server classpath, e.g., $SPARK_HOME/jars, then restart history server.
-
-## Native plan string
-
-Gluten supports inject native plan string into Spark explain with formatted 
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
-Here is an example, how Gluten show the native plan string.
-
-```
-(9) WholeStageCodegenTransformer (2)
-Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
-Arguments: false
-Native Plan:
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
-  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
-    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-```
-
-## Native plan with stats
-
-Gluten supports print native plan with stats to executor system output stream 
by setting `--conf spark.gluten.sql.debug=true`.
-Note that, the plan string with stats is task level which may cause executor 
log size big. Here is an example, how Gluten show the native plan string with 
stats.
-
-```
-I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native 
Plan with stats for: [Stage: 1 TID: 16]
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
-   Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time: 
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
-      queuedWallNanos              sum: 2.00us, count: 1, min: 2.00us, max: 
2.00us
-      runningAddInputWallNanos     sum: 626ns, count: 1, min: 626ns, max: 626ns
-      runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
-      runningGetOutputWallNanos    sum: 5.54us, count: 1, min: 5.54us, max: 
5.54us
-  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-     Output: 27 rows (3.56KB, 3 batches), Cpu time: 223.00us, Blocked wall 
time: 0ns, Peak memory: 93.12KB, Memory allocations: 15
-     HashBuild: Input: 10 rows (960B, 10 batches), Output: 0 rows (0B, 0 
batches), Cpu time: 185.67us, Blocked wall time: 0ns, Peak memory: 68.00KB, 
Memory allocations: 2, Threads: 1
-        distinctKey0                 sum: 4, count: 1, min: 4, max: 4
-        hashtable.capacity           sum: 4, count: 1, min: 4, max: 4
-        hashtable.numDistinct        sum: 10, count: 1, min: 10, max: 10
-        hashtable.numRehashes        sum: 1, count: 1, min: 1, max: 1
-        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
-        rangeKey0                    sum: 4, count: 1, min: 4, max: 4
-        runningAddInputWallNanos     sum: 1.27ms, count: 1, min: 1.27ms, max: 
1.27ms
-        runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
-        runningGetOutputWallNanos    sum: 1.29us, count: 1, min: 1.29us, max: 
1.29us
-     H23/11/21 10:19:42 INFO TaskSetManager: Finished task 3.0 in stage 1.0 
(TID 13) in 335 ms on 10.221.97.35 (executor driver) (1/10)
-ashProbe: Input: 9 rows (864B, 3 batches), Output: 27 rows (3.56KB, 3 
batches), Cpu time: 37.33us, Blocked wall time: 0ns, Peak memory: 25.12KB, 
Memory allocations: 13, Threads: 1
-        dynamicFiltersProduced       sum: 1, count: 1, min: 1, max: 1
-        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
-        runningAddInputWallNanos     sum: 4.54us, count: 1, min: 4.54us, max: 
4.54us
-        runningFinishWallNanos       sum: 83ns, count: 1, min: 83ns, max: 83ns
-        runningGetOutputWallNanos    sum: 29.08us, count: 1, min: 29.08us, 
max: 29.08us
-    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
-       Input: 9 rows (864B, 3 batches), Output: 9 rows (864B, 3 batches), Cpu 
time: 630.75us, Blocked wall time: 0ns, Peak memory: 2.44KB, Memory 
allocations: 63, Threads: 1, Splits: 3
-          dataSourceWallNanos              sum: 102.00us, count: 1, min: 
102.00us, max: 102.00us
-          dynamicFiltersAccepted           sum: 1, count: 1, min: 1, max: 1
-          flattenStringDictionaryValues    sum: 0, count: 1, min: 0, max: 0
-          ioWaitNanos                      sum: 312.00us, count: 1, min: 
312.00us, max: 312.00us
-          localReadBytes                   sum: 0B, count: 1, min: 0B, max: 0B
-          numLocalRead                     sum: 0, count: 1, min: 0, max: 0
-          numPrefetch                      sum: 0, count: 1, min: 0, max: 0
-          numRamRead                       sum: 0, count: 1, min: 0, max: 0
-          numStorageRead                   sum: 6, count: 1, min: 6, max: 6
-          overreadBytes                    sum: 0B, count: 1, min: 0B, max: 0B
-          prefetchBytes                    sum: 0B, count: 1, min: 0B, max: 0B
-          queryThreadIoLatency             sum: 12, count: 1, min: 12, max: 12
-          ramReadBytes                     sum: 0B, count: 1, min: 0B, max: 0B
-          runningAddInputWallNanos         sum: 0ns, count: 1, min: 0ns, max: 
0ns
-          runningFinishWallNanos           sum: 125ns, count: 1, min: 125ns, 
max: 125ns
-          runningGetOutputWallNanos        sum: 1.07ms, count: 1, min: 1.07ms, 
max: 1.07ms
-          skippedSplitBytes                sum: 0B, count: 1, min: 0B, max: 0B
-          skippedSplits                    sum: 0, count: 1, min: 0, max: 0
-          skippedStrides                   sum: 0, count: 1, min: 0, max: 0
-          storageReadBytes                 sum: 3.44KB, count: 1, min: 3.44KB, 
max: 3.44KB
-          totalScanTime                    sum: 0ns, count: 1, min: 0ns, max: 
0ns
-    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-       Input: 0 rows (0B, 0 batches), Output: 10 rows (960B, 10 batches), Cpu 
time: 1.03ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, 
Threads: 1
-          runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
-          runningFinishWallNanos       sum: 54.62us, count: 1, min: 54.62us, 
max: 54.62us
-          runningGetOutputWallNanos    sum: 1.10ms, count: 1, min: 1.10ms, 
max: 1.10ms
-```
+Please refer [Gluten UI](VeloxGlutenUI.md)
 
 # Gluten Implicits
 
@@ -852,3 +511,7 @@ df.fallbackSummary
 Note that, if AQE is enabled, but the query is not materialized, then it will 
re-plan
 the query execution with disabled AQE. It is a workaround to get the final 
plan, and it may
 cause the inconsistent results with a materialized query. However, we have no 
choice.
+
+# Accelerators
+
+Please refer [HBM](VeloxHBM.md) [QAT](VeloxQAT.md) [IAA](VeloxIAA.md) for 
details
diff --git a/docs/get-started/VeloxGlutenUI.md 
b/docs/get-started/VeloxGlutenUI.md
new file mode 100644
index 0000000000..6f40e25b9e
--- /dev/null
+++ b/docs/get-started/VeloxGlutenUI.md
@@ -0,0 +1,116 @@
+---
+layout: page
+title: Gluten with Velox Backend
+nav_order: 1
+parent: Getting-Started
+---
+
+
+# Gluten UI
+
+## Gluten event
+
+Gluten provides two events `GlutenBuildInfoEvent` and 
`GlutenPlanFallbackEvent`:
+
+- GlutenBuildInfoEvent, it contains the Gluten build information so that we 
are able to be aware of the environment when doing some debug.
+  It includes `Java Version`, `Scala Version`, `GCC Version`, `Gluten 
Version`, `Spark Version`, `Hadoop Version`, `Gluten Revision`, `Backend`, 
`Backend Revision`, etc.
+
+- GlutenPlanFallbackEvent, it contains the fallback information for each query 
execution.
+  Note, if the query execution is in AQE, then Gluten will post it for each 
stage.
+
+Developers can register `SparkListener` to handle these two Gluten events.
+
+## SQL tab
+
+Gluten provides a tab based on Spark UI, named `Gluten SQL / DataFrame`
+
+![Gluten-UI](../image/gluten-ui.png)
+
+This tab contains two parts:
+
+1. The Gluten build information.
+2. SQL/Dataframe queries fallback information.
+
+If you want to disable Gluten UI, add a config when submitting `--conf 
spark.gluten.ui.enabled=false`.
+
+## History server
+
+Gluten UI also supports Spark history server. Add gluten-ui jar into the 
history server classpath, e.g., $SPARK_HOME/jars, then restart history server.
+
+## Native plan string
+
+Gluten supports inject native plan string into Spark explain with formatted 
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
+Here is an example, how Gluten show the native plan string.
+
+```
+(9) WholeStageCodegenTransformer (2)
+Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
+Arguments: false
+Native Plan:
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
+  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
+    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+```
+
+## Native plan with stats
+
+Gluten supports print native plan with stats to executor system output stream 
by setting `--conf spark.gluten.sql.debug=true`.
+Note that, the plan string with stats is task level which may cause executor 
log size big. Here is an example, how Gluten show the native plan string with 
stats.
+
+```
+I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native 
Plan with stats for: [Stage: 1 TID: 16]
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
+   Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time: 
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
+      queuedWallNanos              sum: 2.00us, count: 1, min: 2.00us, max: 
2.00us
+      runningAddInputWallNanos     sum: 626ns, count: 1, min: 626ns, max: 626ns
+      runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
+      runningGetOutputWallNanos    sum: 5.54us, count: 1, min: 5.54us, max: 
5.54us
+  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+     Output: 27 rows (3.56KB, 3 batches), Cpu time: 223.00us, Blocked wall 
time: 0ns, Peak memory: 93.12KB, Memory allocations: 15
+     HashBuild: Input: 10 rows (960B, 10 batches), Output: 0 rows (0B, 0 
batches), Cpu time: 185.67us, Blocked wall time: 0ns, Peak memory: 68.00KB, 
Memory allocations: 2, Threads: 1
+        distinctKey0                 sum: 4, count: 1, min: 4, max: 4
+        hashtable.capacity           sum: 4, count: 1, min: 4, max: 4
+        hashtable.numDistinct        sum: 10, count: 1, min: 10, max: 10
+        hashtable.numRehashes        sum: 1, count: 1, min: 1, max: 1
+        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
+        rangeKey0                    sum: 4, count: 1, min: 4, max: 4
+        runningAddInputWallNanos     sum: 1.27ms, count: 1, min: 1.27ms, max: 
1.27ms
+        runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
+        runningGetOutputWallNanos    sum: 1.29us, count: 1, min: 1.29us, max: 
1.29us
+     H23/11/21 10:19:42 INFO TaskSetManager: Finished task 3.0 in stage 1.0 
(TID 13) in 335 ms on 10.221.97.35 (executor driver) (1/10)
+ashProbe: Input: 9 rows (864B, 3 batches), Output: 27 rows (3.56KB, 3 
batches), Cpu time: 37.33us, Blocked wall time: 0ns, Peak memory: 25.12KB, 
Memory allocations: 13, Threads: 1
+        dynamicFiltersProduced       sum: 1, count: 1, min: 1, max: 1
+        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
+        runningAddInputWallNanos     sum: 4.54us, count: 1, min: 4.54us, max: 
4.54us
+        runningFinishWallNanos       sum: 83ns, count: 1, min: 83ns, max: 83ns
+        runningGetOutputWallNanos    sum: 29.08us, count: 1, min: 29.08us, 
max: 29.08us
+    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
+       Input: 9 rows (864B, 3 batches), Output: 9 rows (864B, 3 batches), Cpu 
time: 630.75us, Blocked wall time: 0ns, Peak memory: 2.44KB, Memory 
allocations: 63, Threads: 1, Splits: 3
+          dataSourceWallNanos              sum: 102.00us, count: 1, min: 
102.00us, max: 102.00us
+          dynamicFiltersAccepted           sum: 1, count: 1, min: 1, max: 1
+          flattenStringDictionaryValues    sum: 0, count: 1, min: 0, max: 0
+          ioWaitNanos                      sum: 312.00us, count: 1, min: 
312.00us, max: 312.00us
+          localReadBytes                   sum: 0B, count: 1, min: 0B, max: 0B
+          numLocalRead                     sum: 0, count: 1, min: 0, max: 0
+          numPrefetch                      sum: 0, count: 1, min: 0, max: 0
+          numRamRead                       sum: 0, count: 1, min: 0, max: 0
+          numStorageRead                   sum: 6, count: 1, min: 6, max: 6
+          overreadBytes                    sum: 0B, count: 1, min: 0B, max: 0B
+          prefetchBytes                    sum: 0B, count: 1, min: 0B, max: 0B
+          queryThreadIoLatency             sum: 12, count: 1, min: 12, max: 12
+          ramReadBytes                     sum: 0B, count: 1, min: 0B, max: 0B
+          runningAddInputWallNanos         sum: 0ns, count: 1, min: 0ns, max: 
0ns
+          runningFinishWallNanos           sum: 125ns, count: 1, min: 125ns, 
max: 125ns
+          runningGetOutputWallNanos        sum: 1.07ms, count: 1, min: 1.07ms, 
max: 1.07ms
+          skippedSplitBytes                sum: 0B, count: 1, min: 0B, max: 0B
+          skippedSplits                    sum: 0, count: 1, min: 0, max: 0
+          skippedStrides                   sum: 0, count: 1, min: 0, max: 0
+          storageReadBytes                 sum: 3.44KB, count: 1, min: 3.44KB, 
max: 3.44KB
+          totalScanTime                    sum: 0ns, count: 1, min: 0ns, max: 
0ns
+    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+       Input: 0 rows (0B, 0 batches), Output: 10 rows (960B, 10 batches), Cpu 
time: 1.03ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, 
Threads: 1
+          runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
+          runningFinishWallNanos       sum: 54.62us, count: 1, min: 54.62us, 
max: 54.62us
+          runningGetOutputWallNanos    sum: 1.10ms, count: 1, min: 1.10ms, 
max: 1.10ms
+```
diff --git a/docs/get-started/VeloxHBM.md b/docs/get-started/VeloxHBM.md
new file mode 100644
index 0000000000..7056bb2fb0
--- /dev/null
+++ b/docs/get-started/VeloxHBM.md
@@ -0,0 +1,30 @@
+---
+layout: page
+title: HBM Support in Velox Backend
+nav_order: 1
+parent: Getting-Started
+---
+# High-Bandwidth Memory (HBM) support
+
+Gluten supports allocating memory on HBM. This feature is optional and is 
disabled by default. It is implemented on top of [Memkind 
library](http://memkind.github.io/memkind/). You can refer to memkind's 
[readme](https://github.com/memkind/memkind#memkind) for more details.
+
+# Build Gluten with HBM
+
+Gluten will internally build and link to a specific version of Memkind library 
and [hwloc](https://github.com/open-mpi/hwloc). Other dependencies should be 
installed on Driver and Worker node first:
+
+```bash
+sudo apt install -y autoconf automake g++ libnuma-dev libtool numactl unzip 
libdaxctl-dev
+```
+
+After the set-up, you can now build Gluten with HBM. Below command is used to 
enable this feature
+
+```bash
+cd /path/to/gluten
+
+## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
+./dev/buildbundle-veloxbe.sh --enable_hbm=ON
+```
+
+## Configure and enable HBM in Spark Application
+
+At runtime, `MEMKIND_HBW_NODES` enviroment variable is detected for 
configuring HBM NUMA nodes. For the explaination to this variable, please refer 
to memkind's manual page. This can be set for all executors through spark conf, 
e.g. `--conf spark.executorEnv.MEMKIND_HBW_NODES=8-15`. Note that memory 
allocation fallback is also supported and cannot be turned off. If HBM is 
unavailable or fills up, the allocator will use default(DDR) memory.
diff --git a/docs/get-started/VeloxIAA.md b/docs/get-started/VeloxIAA.md
new file mode 100644
index 0000000000..ffd74bf6c6
--- /dev/null
+++ b/docs/get-started/VeloxIAA.md
@@ -0,0 +1,75 @@
+---
+layout: page
+title: IAA Support in Velox Backend
+nav_order: 1
+parent: Getting-Started
+---
+
+
+# Intel® In-memory Analytics Accelerator (IAA/IAX) support
+
+Similar to Intel® QAT, Gluten supports using Intel® In-memory Analytics 
Accelerator (IAA, also called IAX) for data compression during Spark Shuffle. 
It benefits from IAA Hardware-based acceleration on compression/decompression, 
and uses Gzip as compression format for higher compression ratio to reduce the 
pressure on disks and network transmission.
+
+This feature is based on Intel® [QPL](https://github.com/intel/qpl).
+
+## Build Gluten with IAA
+
+Gluten will internally build and link to a specific version of QPL library, 
but extra environment setup is still required. Please refer to [QPL 
Installation 
Guide](https://intel.github.io/qpl/documentation/get_started_docs/installation.html)
 to install dependencies and configure accelerators.
+
+**This step is required if your application is running as Non-root user.**
+Create a group for the users who have privilege to use IAA, and grant group 
iaa read/write access to the IAA Work-Queues.
+
+```bash
+sudo groupadd iaa
+sudo usermod -aG iaa username # need to relogin
+sudo chgrp -R iaa /dev/iax
+sudo chmod -R g+rw /dev/iax
+```
+
+After the set-up, you can now build Gluten with QAT. Below command is used to 
enable this feature
+
+```bash
+cd /path/to/gluten
+
+## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
+./dev/buildbundle-veloxbe.sh --enable_iaa=ON
+```
+
+## Enable IAA with Gzip Compression for shuffle compression
+
+1. To enable QAT at run-time, first make sure you have configured the IAA 
Work-Queues correctly, and the file permissions of /dev/iax/wqX.0 are correct.
+
+```bash
+sudo ls -l /dev/iax
+```
+
+The output should be like:
+
+```
+total 0
+crw-rw---- 1 root iaa 509, 0 Apr  5 18:54 wq1.0
+crw-rw---- 1 root iaa 509, 5 Apr  5 18:54 wq11.0
+crw-rw---- 1 root iaa 509, 6 Apr  5 18:54 wq13.0
+crw-rw---- 1 root iaa 509, 7 Apr  5 18:54 wq15.0
+crw-rw---- 1 root iaa 509, 1 Apr  5 18:54 wq3.0
+crw-rw---- 1 root iaa 509, 2 Apr  5 18:54 wq5.0
+crw-rw---- 1 root iaa 509, 3 Apr  5 18:54 wq7.0
+crw-rw---- 1 root iaa 509, 4 Apr  5 18:54 wq9.0
+```
+
+2. Extra Gluten configurations are required when starting Spark application
+
+```
+--conf spark.gluten.sql.columnar.shuffle.codec=gzip
+--conf spark.gluten.sql.columnar.shuffle.codecBackend=iaa
+```
+
+## IAA references
+
+**Intel® IAA Enabling Guide**
+
+Check out the [Intel® In-Memory Analytics Accelerator (Intel® IAA) Enabling 
Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-iaa-enabling-guide.html)
+
+**Intel® QPL Documentation**
+
+Check out the [Intel® Query Processing Library (Intel® QPL) 
Documentation](https://intel.github.io/qpl/index.html)
diff --git a/docs/get-started/VeloxQAT.md b/docs/get-started/VeloxQAT.md
new file mode 100644
index 0000000000..6d33654e59
--- /dev/null
+++ b/docs/get-started/VeloxQAT.md
@@ -0,0 +1,155 @@
+---
+layout: page
+title: QAT Support in Velox Backend
+nav_order: 1
+parent: Getting-Started
+---
+
+# Intel® QuickAssist Technology (QAT) support
+
+Gluten supports using Intel® QuickAssist Technology (QAT) for data compression 
during Spark Shuffle. It benefits from QAT Hardware-based acceleration on 
compression/decompression, and uses Gzip as compression format for higher 
compression ratio to reduce the pressure on disks and network transmission.
+
+This feature is based on QAT driver library and 
[QATzip](https://github.com/intel/QATzip) library. Please manually download QAT 
driver for your system, and follow its README to build and install on all 
Driver and Worker node: [Intel® QuickAssist Technology Driver for Linux* – HW 
Version 
2.0](https://www.intel.com/content/www/us/en/download/765501/intel-quickassist-technology-driver-for-linux-hw-version-2-0.html?wapkw=quickassist).
+
+## Software Requirements
+
+- Download QAT driver for your system, and follow its README to build and 
install on all Driver and Worker nodes: [Intel® QuickAssist Technology Driver 
for Linux* – HW Version 
2.0](https://www.intel.com/content/www/us/en/download/765501/intel-quickassist-technology-driver-for-linux-hw-version-2-0.html?wapkw=quickassist).
+- Below compression libraries need to be installed on all Driver and Worker 
nodes:
+  - Zlib* library of version 1.2.7 or higher
+  - ZSTD* library of version 1.5.4 or higher
+  - LZ4* library
+
+## Build Gluten with QAT
+
+1. Setup ICP_ROOT environment variable to the directory where QAT driver is 
extracted. This environment variable is required during building Gluten and 
running Spark applications. It's recommended to put it in .bashrc on Driver and 
Worker nodes.
+
+```bash
+echo "export ICP_ROOT=/path/to/QAT_driver" >> ~/.bashrc
+source ~/.bashrc
+
+# Also set for root if running as non-root user
+sudo su - 
+echo "export ICP_ROOT=/path/to/QAT_driver" >> ~/.bashrc
+exit
+```
+
+2. **This step is required if your application is running as Non-root user.**
+   The users must be added to the 'qat' group after QAT drvier is installed. 
And change the amount of max locked memory for the username that is included in 
the group name. This can be done by specifying the limit in 
/etc/security/limits.conf.
+
+```bash
+sudo su -
+usermod -aG qat username # need relogin to take effect
+
+# To set 500MB add a line like this in /etc/security/limits.conf
+echo "@qat - memlock 500000" >> /etc/security/limits.conf
+
+exit
+```
+
+3. Enable huge page. This step is required to execute each time after system 
reboot. We recommend using systemctl to manage at system startup. You change 
the values for "max_huge_pages" and "max_huge_pages_per_process" to make sure 
there are enough resources for your workload. As for Spark applications, one 
process matches one executor. Within the executor, every task is allocated a 
maximum of 5 huge pages.
+
+```bash
+sudo su -
+
+cat << EOF > /usr/local/bin/qat_startup.sh
+#!/bin/bash
+echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+rmmod usdm_drv
+insmod $ICP_ROOT/build/usdm_drv.ko max_huge_pages=1024 
max_huge_pages_per_process=32
+EOF
+
+chmod +x /usr/local/bin/qat_startup.sh
+
+cat << EOF > /etc/systemd/system/qat_startup.service
+[Unit]
+Description=Configure QAT
+
+[Service]
+ExecStart=/usr/local/bin/qat_startup.sh
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+systemctl enable qat_startup.service
+systemctl start qat_startup.service # setup immediately
+systemctl status qat_startup.service
+
+exit
+```
+
+4. After the setup, you are now ready to build Gluten with QAT. Use the 
command below to enable this feature:
+
+```bash
+cd /path/to/gluten
+
+## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
+./dev/buildbundle-veloxbe.sh --enable_qat=ON
+```
+
+## Enable QAT with Gzip/Zstd for shuffle compression
+
+1. To offload shuffle compression into QAT, first make sure you have the right 
QAT configuration file at /etc/4xxx_devX.conf. We provide a [example 
configuration file](../qat/4x16.conf). This configuration sets up to 4 
processes that can bind to 1 QAT, and each process can use up to 16 QAT DC 
instances.
+
+```bash
+## run as root
+## Overwrite QAT configuration file.
+cd /etc
+for i in {0..7}; do echo "4xxx_dev$i.conf"; done | xargs -i cp -f 
/path/to/gluten/docs/qat/4x16.conf {}
+## Restart QAT after updating configuration files.
+adf_ctl restart
+```
+
+2. Check QAT status and make sure the status is up
+
+```bash
+adf_ctl status
+```
+
+The output should be like:
+
+```
+Checking status of all devices.
+There is 8 QAT acceleration device(s) in the system:
+ qat_dev0 - type: 4xxx,  inst_id: 0,  node_id: 0,  bsf: 0000:6b:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev1 - type: 4xxx,  inst_id: 1,  node_id: 1,  bsf: 0000:70:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev2 - type: 4xxx,  inst_id: 2,  node_id: 2,  bsf: 0000:75:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev3 - type: 4xxx,  inst_id: 3,  node_id: 3,  bsf: 0000:7a:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev4 - type: 4xxx,  inst_id: 4,  node_id: 4,  bsf: 0000:e8:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev5 - type: 4xxx,  inst_id: 5,  node_id: 5,  bsf: 0000:ed:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev6 - type: 4xxx,  inst_id: 6,  node_id: 6,  bsf: 0000:f2:00.0,  #accel: 
1 #engines: 9 state: up
+ qat_dev7 - type: 4xxx,  inst_id: 7,  node_id: 7,  bsf: 0000:f7:00.0,  #accel: 
1 #engines: 9 state: up
+```
+
+3. Extra Gluten configurations are required when starting Spark application
+
+```
+--conf spark.gluten.sql.columnar.shuffle.codec=gzip # Valid options are gzip 
and zstd
+--conf spark.gluten.sql.columnar.shuffle.codecBackend=qat
+```
+
+4. You can use below command to check whether QAT is working normally at 
run-time. The value of fw_counters should continue to increase during shuffle.
+
+```
+while :; do cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/fw_counters; sleep 1; 
done
+```
+
+## QAT driver references
+
+**Documentation**
+
+[README Text Files 
(README_QAT20.L.1.0.0-00021.txt)](https://downloadmirror.intel.com/765523/README_QAT20.L.1.0.0-00021.txt)
+
+**Release Notes**
+
+Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/632507/intel-quickassist-technology-intel-qat-software-for-linux-release-notes-hardware-version-2-0.html)
 - Release Notes for the latest changes in this release.
+
+**Getting Started Guide**
+
+Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/632506/intel-quickassist-technology-intel-qat-software-for-linux-getting-started-guide-hardware-version-2-0.html)
 - Getting Started Guide for detailed installation instructions.
+
+**Programmer's Guide**
+
+Check out the [Intel® QuickAssist Technology Software for 
Linux*](https://www.intel.com/content/www/us/en/content-details/743912/intel-quickassist-technology-intel-qat-software-for-linux-programmers-guide-hardware-version-2-0.html)
 - Programmer's Guide for software usage guidelines.
+
+For more Intel® QuickAssist Technology resources go to [Intel® QuickAssist 
Technology (Intel® QAT)](https://developer.intel.com/quickassist)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: [VL] Refactor Velox.md (#8478)

Reply via email to