This is an automated email from the ASF dual-hosted git repository.
philo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new e47509abb [DOC] Update release & configuration doc (#4910)
e47509abb is described below
commit e47509abb3b94b2db40d5d9b8080657839609488
Author: PHILO-HE <[email protected]>
AuthorDate: Mon Mar 11 19:49:05 2024 +0800
[DOC] Update release & configuration doc (#4910)
---
.github/workflows/dev_cron/issues_link.js | 2 +-
.github/workflows/dev_cron/title_check.md | 4 +-
CONTRIBUTING.md | 10 +-
README.md | 10 +-
docs/Configuration.md | 129 ++++++++++++---------
docs/_config.yml | 2 +-
docs/contact-us.md | 2 +-
docs/developers/HowTo.md | 2 +-
docs/developers/MicroBenchmarks.md | 6 +-
docs/developers/NewToGluten.md | 4 +-
docs/developers/SubstraitModifications.md | 24 ++--
docs/developers/docker_centos7.md | 2 +-
docs/developers/docker_centos8.md | 2 +-
docs/developers/docker_ubuntu22.04.md | 2 +-
docs/get-started/ClickHouse.md | 8 +-
docs/get-started/Velox.md | 6 +-
docs/index.md | 2 +-
docs/release.md | 9 +-
docs/velox-backend-limitations.md | 4 +-
.../extension/ColumnarOverrides.scala | 6 +-
mkdocs.yml | 2 +-
pom.xml | 2 +-
.../main/scala/io/glutenproject/GlutenConfig.scala | 14 +--
tools/gluten-it/README.md | 4 +-
tools/gluten-te/centos/defaults.conf | 2 +-
tools/gluten-te/ubuntu/README.md | 8 +-
tools/gluten-te/ubuntu/defaults.conf | 2 +-
27 files changed, 145 insertions(+), 125 deletions(-)
diff --git a/.github/workflows/dev_cron/issues_link.js
b/.github/workflows/dev_cron/issues_link.js
index 0b79b91a7..596bad758 100644
--- a/.github/workflows/dev_cron/issues_link.js
+++ b/.github/workflows/dev_cron/issues_link.js
@@ -48,7 +48,7 @@ async function haveComment(github, context,
pullRequestNumber, body) {
}
async function commentISSUESURL(github, context, pullRequestNumber, issuesID) {
- const issuesURL = `https://github.com/oap-project/gluten/issues/${issuesID}`;
+ const issuesURL =
`https://github.com/apache/incubator-gluten/issues/${issuesID}`;
if (await haveComment(github, context, pullRequestNumber, issuesURL)) {
return;
}
diff --git a/.github/workflows/dev_cron/title_check.md
b/.github/workflows/dev_cron/title_check.md
index 6fb45bf64..83d4937ed 100644
--- a/.github/workflows/dev_cron/title_check.md
+++ b/.github/workflows/dev_cron/title_check.md
@@ -21,7 +21,7 @@ Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
-https://github.com/oap-project/gluten/issues
+https://github.com/apache/incubator-gluten/issues
Then could you also rename ***commit message*** and ***pull request title***
in the following format?
@@ -29,5 +29,5 @@ Then could you also rename ***commit message*** and ***pull
request title*** in
See also:
- * [Other pull requests](https://github.com/oap-project/gluten/pulls/)
+ * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index b934cdf78..9450191dd 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -44,15 +44,15 @@ please add at least one UT to ensure code quality and
reduce regression issues f
Please update document for your proposed code change if necessary.
-If a new config property is being introduced, please update
[Configuration.md](https://github.com/oap-project/gluten/blob/main/docs/Configuration.md).
+If a new config property is being introduced, please update
[Configuration.md](https://github.com/apache/incubator-gluten/blob/main/docs/Configuration.md).
### Code Style
##### Java/Scala code style
-Developer can import the code style setting to IDE and format Java/Scala code
with spotless maven plugin. See [Java/Scala code
style](https://github.com/oap-project/gluten/blob/main/docs/developers/NewToGluten.md#javascala-code-style).
+Developer can import the code style setting to IDE and format Java/Scala code
with spotless maven plugin. See [Java/Scala code
style](https://github.com/apache/incubator-gluten/blob/main/docs/developers/NewToGluten.md#javascala-code-style).
##### C/C++ code style
-There are some code style conventions need to comply. See
[CppCodingStyle.md](https://github.com/oap-project/gluten/blob/main/docs/developers/CppCodingStyle.md).
+There are some code style conventions need to comply. See
[CppCodingStyle.md](https://github.com/apache/incubator-gluten/blob/main/docs/developers/CppCodingStyle.md).
For Velox backend, developer can just execute `dev/formatcppcode.sh` to format
C/C++ code. It requires `clang-format-12`
installed in your development env.
@@ -68,7 +68,7 @@ You can execute a script to fix license header issue, as the
following shows.
### Gluten CI
##### ClickHouse Backend CI
-To check CI failure for CH backend, please log in with the public
account/password provided
[here](https://github.com/oap-project/gluten/blob/main/docs/get-started/ClickHouse.md#new-ci-system).
+To check CI failure for CH backend, please log in with the public
account/password provided
[here](https://github.com/apache/incubator-gluten/blob/main/docs/get-started/ClickHouse.md#new-ci-system).
To re-trigger CH CI, please post the below comment on PR page:
`Run Gluten Clickhouse CI`
@@ -79,7 +79,7 @@ To check CI failure for Velox backend, please go into the
GitHub action page fro
To see the perf. impact on Velox backend, you can comment `/Benchmark Velox`
on PR page to trigger a pretest. The benchmark
(currently TPC-H) result will be posted after completed.
-If some new dependency is required to be installed, you may need to do some
change for CI docker at [this
folder](https://github.com/oap-project/gluten/tree/main/tools/gluten-te).
+If some new dependency is required to be installed, you may need to do some
change for CI docker at [this
folder](https://github.com/apache/incubator-gluten/tree/main/tools/gluten-te).
### Code Review
diff --git a/README.md b/README.md
index cc88404d9..0a18d10c8 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
-# Gluten: Plugin to Double SparkSQL's Performance
+# Apache Gluten (Incubating): A Middle Layer for Offloading JVM-based SQL
Engines' Execution to Native Engines
+
[](https://www.bestpractices.dev/projects/8452)
+
*<b>This project is still under active development now, and doesn't have a
stable release. Welcome to evaluate it.</b>*
# 1 Introduction
@@ -30,7 +32,7 @@ The basic rule of Gluten's design is that we would reuse
spark's whole control f
## 1.3 Target User
Gluten's target user is anyone who wants to accelerate SparkSQL fundamentally.
As a plugin to Spark, Gluten doesn't require any change for dataframe API or
SQL query, but only requires user to make correct configuration.
-See Gluten configuration properties
[here](https://github.com/oap-project/gluten/blob/main/docs/Configuration.md).
+See Gluten configuration properties
[here](https://github.com/apache/incubator-gluten/blob/main/docs/Configuration.md).
## 1.4 References
@@ -72,7 +74,7 @@ spark-shell \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=20g \
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
- --jars
https://github.com/oap-project/gluten/releases/download/v1.0.0/gluten-velox-bundle-spark3.2_2.12-ubuntu_20.04_x86_64-1.0.0.jar
+ --jars
https://github.com/apache/incubator-gluten/releases/download/v1.0.0/gluten-velox-bundle-spark3.2_2.12-ubuntu_20.04_x86_64-1.0.0.jar
```
# 3.2 Custom Build
@@ -118,7 +120,7 @@ Please feel free to create Github issue for reporting bug
or proposing enhanceme
## 4.3 Documentation
-Currently, all gluten documents are held at
[docs](https://github.com/oap-project/gluten/tree/main/docs). The documents may
not reflect the latest designs. Please feel free to contact us for getting
design details or sharing your design ideas.
+Currently, all gluten documents are held at
[docs](https://github.com/apache/incubator-gluten/tree/main/docs). The
documents may not reflect the latest designs. Please feel free to contact us
for getting design details or sharing your design ideas.
# 5 Performance
diff --git a/docs/Configuration.md b/docs/Configuration.md
index a6e0b7015..626000bc4 100644
--- a/docs/Configuration.md
+++ b/docs/Configuration.md
@@ -11,75 +11,90 @@ You can add these configurations into spark-defaults.conf
to enable or disable t
## Spark parameters
-| Parameters |
Description
[...]
-|----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
-| spark.driver.extraClassPath | To
add Gluten Plugin jar file in Spark Driver
[...]
-| spark.executor.extraClassPath | To
add Gluten Plugin jar file in Spark Executor
[...]
-| spark.executor.memory | To
set up how much memory to be used for Spark Executor.
[...]
-| spark.memory.offHeap.size | To
set up how much memory to be used for Java OffHeap.<br /> Please notice Gluten
Plugin will leverage this setting to allocate memory space for native usage
even offHeap is disabled. <br /> The value is based on your system and it is
recommended to set it larger if you are facing Out of Memory issue in Gluten
Plugin
[...]
-| spark.sql.sources.useV1SourceList |
Choose to use V1 source
[...]
-| spark.sql.join.preferSortMergeJoin | To
turn off preferSortMergeJoin in Spark
[...]
-| spark.plugins | To
load Gluten's components by Spark's plug-in loader
[...]
-| spark.shuffle.manager | To
turn on Gluten Columnar Shuffle Plugin
[...]
-| spark.gluten.enabled |
Enable Gluten, default is true. Just an experimental property. Recommend to
enable/disable Gluten through the setting for `spark.plugins`.
[...]
-| spark.gluten.sql.columnar.maxBatchSize |
Number of rows to be processed in each batch. Default value is 4096.
[...]
-| spark.gluten.memory.isolation |
(Experimental) Enable isolated memory mode. If true, Gluten controls the
maximum off-heap memory can be used by each task to X, X = executor memory /
max task slots. It's recommended to set true if Gluten serves concurrent
queries within a single session, since not all memory Gluten allocated is
guaranteed to be spillable. In the case, the feature should be enabled to avoid
OOM. Note when true, setting spark.memory. [...]
-| spark.gluten.sql.columnar.scanOnly | When
enabled, this config will overwrite all other operators' enabling, and only
Scan and Filter pushdown will be offloaded to native.
[...]
-| spark.gluten.sql.columnar.batchscan |
Enable or Disable Columnar BatchScan, default is true
[...]
-| spark.gluten.sql.columnar.hashagg |
Enable or Disable Columnar Hash Aggregate, default is true
[...]
-| spark.gluten.sql.columnar.project |
Enable or Disable Columnar Project, default is true
[...]
-| spark.gluten.sql.columnar.filter |
Enable or Disable Columnar Filter, default is true
[...]
-| spark.gluten.sql.columnar.sort |
Enable or Disable Columnar Sort, default is true
[...]
-| spark.gluten.sql.columnar.window |
Enable or Disable Columnar Window, default is true
[...]
-| spark.gluten.sql.columnar.shuffledHashJoin |
Enable or Disable ShuffledHashJoin, default is true
[...]
-| spark.gluten.sql.columnar.forceShuffledHashJoin | Force
to use ShuffledHashJoin over SortMergeJoin, default is true. For queries that
can benefit from storaged patitioned join, please set it to false.
[...]
-| spark.gluten.sql.columnar.sortMergeJoin |
Enable or Disable Columnar Sort Merge Join, default is true
[...]
-| spark.gluten.sql.columnar.union |
Enable or Disable Columnar Union, default is true
[...]
-| spark.gluten.sql.columnar.expand |
Enable or Disable Columnar Expand, default is true
[...]
-| spark.gluten.sql.columnar.generate |
Enable or Disable Columnar Generate, default is true
[...]
-| spark.gluten.sql.columnar.limit |
Enable or Disable Columnar Limit, default is true
[...]
-| spark.gluten.sql.columnar.tableCache |
Enable or Disable Columnar Table Cache, default is false
[...]
-| spark.gluten.sql.columnar.broadcastExchange |
Enable or Disable Columnar Broadcast Exchange, default is true
[...]
-| spark.gluten.sql.columnar.broadcastJoin |
Enable or Disable Columnar BroadcastHashJoin, default is true
[...]
-| spark.gluten.sql.columnar.shuffle.codec | Set
up the codec to be used for Columnar Shuffle. If this configuration is not set,
will check the value of spark.io.compression.codec. By default, Gluten use
software compression. Valid options for software compression are lz4, zstd.
Valid options for QAT and IAA is gzip.
[...]
-| spark.gluten.sql.columnar.shuffle.codecBackend |
Enable using hardware accelerators for shuffle de/compression. Valid options
are QAT and IAA.
[...]
-| spark.gluten.sql.columnar.shuffle.compressionMode |
Setting different compression mode in shuffle, Valid options are buffer and
rowvector, buffer option compress each buffer of RowVector individually into
one pre-allocated large buffer, rowvector option first copies each buffer of
RowVector to a large buffer and then compress the entire buffer in one go.
[...]
-| spark.gluten.sql.columnar.shuffle.compression.threshold | If
number of rows in a batch falls below this threshold, will copy all buffers
into one buffer to compress.
[...]
-| spark.gluten.sql.columnar.shuffle.realloc.threshold | Set
the threshold to dynamically adjust the size of shuffle split buffers. The size
of each split buffer is recalculated for each incoming batch of data. If the
new size deviates from the current partition buffer size by a factor outside
the range of [1 - threshold, 1 + threshold], the split buffer will be
re-allocated using the newly calculated size
[...]
-| spark.gluten.sql.columnar.shuffle.merge.threshold | Set
the threshold control the minimum merged size. When a partition buffer is full,
and the number of rows is below (`threshold *
spark.gluten.sql.columnar.maxBatchSize`), it will be saved for merging.
[...]
-| spark.gluten.sql.columnar.numaBinding | Set
up NUMABinding, default is false
[...]
-| spark.gluten.sql.columnar.coreRange | Set
up the core range for NUMABinding, only works when numaBinding set to true. <br
/> The setting is based on the number of cores in your system. Use 72 cores as
an example.
[...]
-| spark.gluten.sql.native.bloomFilter |
Enable or Disable native runtime bloom filter.
[...]
-| spark.gluten.sql.columnar.wholeStage.fallback.threshold |
Configure the threshold for whether whole stage will fall back in AQE supported
case by counting the number of ColumnarToRow & vanilla leaf node
[...]
-| spark.gluten.sql.columnar.query.fallback.threshold |
Configure the threshold for whether query will fall back by counting the number
of ColumnarToRow & vanilla leaf node
[...]
-| spark.gluten.sql.columnar.fallback.ignoreRowToColumnar | When
true, the fallback policy ignores the RowToColumnar when counting fallback
number.
[...]
-| spark.gluten.sql.columnar.fallback.preferColumnar | When
true, the fallback policy prefers to use Gluten plan rather than vanilla Spark
plan if the both of them contains ColumnarToRow and the vanilla Spark plan
ColumnarToRow number is not smaller than Gluten plan.
[...]
-| spark.gluten.sql.columnar.maxBatchSize | Set
the number of rows for the output batch
[...]
-| spark.gluten.shuffleWriter.bufferSize | Set
the number of buffer rows for the shuffle writer
[...]
-| spark.gluten.loadLibFromJar |
Controls whether to load dynamic link library from a packed jar for gluten/cpp.
Not applicable to static build and clickhouse backend.
[...]
-| spark.gluten.sql.columnar.force.hashagg | Force
to use hash agg to replace sort agg.
[...]
-| spark.gluten.sql.columnar.vanillaReaders |
Enable vanilla spark's vectorized reader. Please note it may bring perf.
overhead due to extra data transition. We recommend to disable it if most
queries can be fully offloaded to gluten.
[...]
-| spark.gluten.expression.blacklist | A
black list of expression to skip transform, multiple values separated by
commas.
[...]
+| Parameters | Description
[...]
+|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| spark.driver.extraClassPath | To add Gluten
Plugin jar file in Spark Driver
[...]
+| spark.executor.extraClassPath | To add Gluten
Plugin jar file in Spark Executor
[...]
+| spark.executor.memory | To set up how
much memory to be used for Spark Executor.
[...]
+| spark.memory.offHeap.size | To set up how
much memory to be used for Java OffHeap.<br /> Please notice Gluten Plugin will
leverage this setting to allocate memory space for native usage even offHeap is
disabled. <br /> The value is based on your system and it is recommended to set
it larger if you are facing Out of Memory issue in Gluten Plugin
[...]
+| spark.sql.sources.useV1SourceList | Choose to use
V1 source
[...]
+| spark.sql.join.preferSortMergeJoin | To turn off
preferSortMergeJoin in Spark
[...]
+| spark.plugins | To load
Gluten's components by Spark's plug-in loader
[...]
+| spark.shuffle.manager | To turn on
Gluten Columnar Shuffle Plugin
[...]
+| spark.gluten.enabled | Enable Gluten,
default is true. Just an experimental property. Recommend to enable/disable
Gluten through the setting for `spark.plugins`.
[...]
+| spark.gluten.sql.columnar.maxBatchSize | Number of rows
to be processed in each batch. Default value is 4096.
[...]
+| spark.gluten.memory.isolation | (Experimental)
Enable isolated memory mode. If true, Gluten controls the maximum off-heap
memory can be used by each task to X, X = executor memory / max task slots.
It's recommended to set true if Gluten serves concurrent queries within a
single session, since not all memory Gluten allocated is guaranteed to be
spillable. In the case, the feature should be enabled to avoid OOM. Note when
true, setting spark.memory.storageFra [...]
+| spark.gluten.sql.columnar.scanOnly | When enabled,
this config will overwrite all other operators' enabling, and only Scan and
Filter pushdown will be offloaded to native.
[...]
+| spark.gluten.sql.columnar.batchscan | Enable or
Disable Columnar BatchScan, default is true
[...]
+| spark.gluten.sql.columnar.hashagg | Enable or
Disable Columnar Hash Aggregate, default is true
[...]
+| spark.gluten.sql.columnar.project | Enable or
Disable Columnar Project, default is true
[...]
+| spark.gluten.sql.columnar.filter | Enable or
Disable Columnar Filter, default is true
[...]
+| spark.gluten.sql.columnar.sort | Enable or
Disable Columnar Sort, default is true
[...]
+| spark.gluten.sql.columnar.window | Enable or
Disable Columnar Window, default is true
[...]
+| spark.gluten.sql.columnar.shuffledHashJoin | Enable or
Disable ShuffledHashJoin, default is true
[...]
+| spark.gluten.sql.columnar.forceShuffledHashJoin | Force to use
ShuffledHashJoin over SortMergeJoin, default is true. For queries that can
benefit from storaged patitioned join, please set it to false.
[...]
+| spark.gluten.sql.columnar.sortMergeJoin | Enable or
Disable Columnar Sort Merge Join, default is true
[...]
+| spark.gluten.sql.columnar.union | Enable or
Disable Columnar Union, default is true
[...]
+| spark.gluten.sql.columnar.expand | Enable or
Disable Columnar Expand, default is true
[...]
+| spark.gluten.sql.columnar.generate | Enable or
Disable Columnar Generate, default is true
[...]
+| spark.gluten.sql.columnar.limit | Enable or
Disable Columnar Limit, default is true
[...]
+| spark.gluten.sql.columnar.tableCache | Enable or
Disable Columnar Table Cache, default is false
[...]
+| spark.gluten.sql.columnar.broadcastExchange | Enable or
Disable Columnar Broadcast Exchange, default is true
[...]
+| spark.gluten.sql.columnar.broadcastJoin | Enable or
Disable Columnar BroadcastHashJoin, default is true
[...]
+| spark.gluten.sql.columnar.shuffle.codec | Set up the
codec to be used for Columnar Shuffle. If this configuration is not set, will
check the value of spark.io.compression.codec. By default, Gluten use software
compression. Valid options for software compression are lz4, zstd. Valid
options for QAT and IAA is gzip.
[...]
+| spark.gluten.sql.columnar.shuffle.codecBackend | Enable using
hardware accelerators for shuffle de/compression. Valid options are QAT and
IAA.
[...]
+| spark.gluten.sql.columnar.shuffle.compressionMode | Setting
different compression mode in shuffle, Valid options are buffer and rowvector,
buffer option compress each buffer of RowVector individually into one
pre-allocated large buffer, rowvector option first copies each buffer of
RowVector to a large buffer and then compress the entire buffer in one go.
[...]
+| spark.gluten.sql.columnar.shuffle.compression.threshold | If number of
rows in a batch falls below this threshold, will copy all buffers into one
buffer to compress.
[...]
+| spark.gluten.sql.columnar.shuffle.realloc.threshold | Set the
threshold to dynamically adjust the size of shuffle split buffers. The size of
each split buffer is recalculated for each incoming batch of data. If the new
size deviates from the current partition buffer size by a factor outside the
range of [1 - threshold, 1 + threshold], the split buffer will be re-allocated
using the newly calculated size
[...]
+| spark.gluten.sql.columnar.shuffle.merge.threshold | Set the
threshold control the minimum merged size. When a partition buffer is full, and
the number of rows is below (`threshold *
spark.gluten.sql.columnar.maxBatchSize`), it will be saved for merging.
[...]
+| spark.gluten.sql.columnar.numaBinding | Set up
NUMABinding, default is false
[...]
+| spark.gluten.sql.columnar.coreRange | Set up the core
range for NUMABinding, only works when numaBinding set to true. <br /> The
setting is based on the number of cores in your system. Use 72 cores as an
example.
[...]
+| spark.gluten.sql.native.bloomFilter | Enable or
Disable native runtime bloom filter.
[...]
+| spark.gluten.sql.columnar.wholeStage.fallback.threshold | Configure the
threshold for whether whole stage will fall back in AQE supported case by
counting the number of ColumnarToRow & vanilla leaf node
[...]
+| spark.gluten.sql.columnar.query.fallback.threshold | Configure the
threshold for whether query will fall back by counting the number of
ColumnarToRow & vanilla leaf node
[...]
+| spark.gluten.sql.columnar.fallback.ignoreRowToColumnar | When true, the
fallback policy ignores the RowToColumnar when counting fallback number.
[...]
+| spark.gluten.sql.columnar.fallback.preferColumnar | When true, the
fallback policy prefers to use Gluten plan rather than vanilla Spark plan if
the both of them contains ColumnarToRow and the vanilla Spark plan
ColumnarToRow number is not smaller than Gluten plan.
[...]
+| spark.gluten.sql.columnar.maxBatchSize | Set the number
of rows for the output batch.
[...]
+| spark.gluten.shuffleWriter.bufferSize | Set the number
of buffer rows for the shuffle writer
[...]
+| spark.gluten.loadLibFromJar | Controls
whether to load dynamic link library from a packed jar for gluten/cpp. Not
applicable to static build and clickhouse backend.
[...]
+| spark.gluten.sql.columnar.force.hashagg | Force to use
hash agg to replace sort agg.
[...]
+| spark.gluten.sql.columnar.vanillaReaders | Enable vanilla
spark's vectorized reader. Please note it may bring perf. overhead due to extra
data transition. We recommend to disable it if most queries can be fully
offloaded to gluten.
[...]
+| spark.gluten.expression.blacklist | A black list of
expression to skip transform, multiple values separated by commas.
[...]
+| spark.gluten.sql.columnar.fallback.expressions.threshold | Fall back
filter/project if the height of expression tree reaches this threshold,
considering Spark codegen can bring better performance for such case.
[...]
+| spark.gluten.sql.cartesianProductTransformerEnabled | Config to
enable CartesianProductExecTransformer.
[...]
+ | spark.gluten.sql.broadcastNestedLoopJoinTransformerEnabled | Config to
enable BroadcastNestedLoopJoinExecTransformer.
[...]
+ | spark.gluten.sql.cacheWholeStageTransformerContext | When true,
`WholeStageTransformer` will cache the `WholeStageTransformerContext` when
executing. It is used to get substrait plan node and native plan string.
[...]
+ | spark.gluten.sql.injectNativePlanStringToExplain | When true,
Gluten will inject native plan tree to explain string inside
`WholeStageTransformerContext`.
[...]
## Velox Parameters
The following configurations are related to Velox settings.
-| Parameters |
Description
[...]
-|----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
-| spark.gluten.sql.columnar.backend.velox.bloomFilter.expectedNumItems | The
default number of expected items for the velox bloomfilter.
[...]
-| spark.gluten.sql.columnar.backend.velox.bloomFilter.numBits | The
default number of bits to use for the velox bloom filter.
[...]
-| spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits | The
max number of bits to use for the velox bloom filter.
[...]
+| Parameters |
Description
| Recommend
Setting |
+|----------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
+| spark.gluten.sql.columnar.backend.velox.bloomFilter.expectedNumItems | The
default number of expected items for the velox bloomfilter.
| 1000000L
|
+| spark.gluten.sql.columnar.backend.velox.bloomFilter.numBits | The
default number of bits to use for the velox bloom filter.
| 8388608L
|
+| spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits | The
max number of bits to use for the velox bloom filter.
| 4194304L
|
+ | spark.gluten.sql.columnar.backend.velox.fileHandleCacheEnabled |
Disables caching if false. File handle cache should be disabled if files are
mutable, i.e. file content may change while file path stays the same. |
|
+ | spark.gluten.sql.columnar.backend.velox.directorySizeGuess | Set
the directory size guess for velox file scan.
|
|
+ | spark.gluten.sql.columnar.backend.velox.filePreloadThreshold | Set
the file preload threshold for velox file scan.
|
|
+ | spark.gluten.sql.columnar.backend.velox.prefetchRowGroups | Set
the prefetch row groups for velox file scan.
|
|
+ | spark.gluten.sql.columnar.backend.velox.loadQuantum | Set
the load quantum for velox file scan.
|
|
+| spark.gluten.sql.columnar.backend.velox.maxCoalescedDistanceBytes | Set
the max coalesced distance bytes for velox file scan.
|
|
+| spark.gluten.sql.columnar.backend.velox.maxCoalescedBytes | Set
the max coalesced bytes for velox file scan.
|
|
+| spark.gluten.sql.columnar.backend.velox.cachePrefetchMinPct | Set
prefetch cache min pct for velox file scan.
|
|
+| spark.gluten.velox.awsSdkLogLevel | Log
granularity of AWS C++ SDK in velox.
| FATAL
|
+| spark.gluten.sql.columnar.backend.velox.orc.scan.enabled |
Enable velox orc scan. If disabled, vanilla spark orc scan will be used.
| true
|
+| spark.gluten.sql.complexType.scan.fallback.enabled | Force
fallback for complex type scan, including struct, map, array.
| true
|
-Below is an example for spark-default.conf:
```
##### Columnar Process Configuration
spark.plugins io.glutenproject.GlutenPlugin
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager
-spark.driver.extraClassPath
${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar
-spark.executor.extraClassPath
${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar
+spark.driver.extraClassPath ${GLUTEN_HOME}/package/target/gluten-XXX.jar
+spark.executor.extraClassPath ${GLUTEN_HOME}/package/target/gluten-XXX.jar
######
```
diff --git a/docs/_config.yml b/docs/_config.yml
index c7afd9fcf..0d42e06f4 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -16,7 +16,7 @@ remote_theme: pmarsceill/just-the-docs
aux_links:
"Gluten on Github":
- - "//github.com/oap-project/gluten"
+ - "//github.com/apache/incubator-gluten"
plugins:
- jekyll-optional-front-matter # GitHub Pages
diff --git a/docs/contact-us.md b/docs/contact-us.md
index d80a6a18f..7c7540401 100644
--- a/docs/contact-us.md
+++ b/docs/contact-us.md
@@ -32,4 +32,4 @@ If you need any help or have questions on this product,
please contact us:
## Issues and Discussions
We use github to track bugs, feature requests, and answer questions. File an
-[issue](https://github.com/oap-project/gluten/issues) for a bug or feature
request.
+[issue](https://github.com/apache/incubator-gluten/issues) for a bug or
feature request.
diff --git a/docs/developers/HowTo.md b/docs/developers/HowTo.md
index 27ede7fe0..587e1b9a2 100644
--- a/docs/developers/HowTo.md
+++ b/docs/developers/HowTo.md
@@ -115,7 +115,7 @@ gdb gluten_home/cpp/build/releases/libgluten.so
'core-Executor task l-2000883-16
Now, both Parquet and DWRF format files are supported, related scripts and
files are under the directory of `gluten_home/backends-velox/workload/tpch`.
The file `README.md` under `gluten_home/backends-velox/workload/tpch` offers
some useful help but it's still not enough and exact.
-One way of run TPC-H test is to run velox-be by workflow, you can refer to
[velox_be.yml](https://github.com/oap-project/gluten/blob/main/.github/workflows/velox_be.yml#L90)
+One way of run TPC-H test is to run velox-be by workflow, you can refer to
[velox_be.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_be.yml#L90)
Here will explain how to run TPC-H on Velox backend with the Parquet file
format.
1. First step, prepare the datasets, you have two choices.
diff --git a/docs/developers/MicroBenchmarks.md
b/docs/developers/MicroBenchmarks.md
index 1fa6d7948..5c83c76db 100644
--- a/docs/developers/MicroBenchmarks.md
+++ b/docs/developers/MicroBenchmarks.md
@@ -58,9 +58,9 @@ Run micro benchmark with the generated files as input. You
need to specify the *
```shell
cd /path/to/gluten/cpp/build/velox/benchmarks
./generic_benchmark \
---plan
/home/sparkuser/github/oap-project/gluten/backends-velox/generated-native-benchmark/example.json
\
---data
/home/sparkuser/github/oap-project/gluten/backends-velox/generated-native-benchmark/example_orders/part-00000-1e66fb98-4dd6-47a6-8679-8625dbc437ee-c000.snappy.parquet,\
-/home/sparkuser/github/oap-project/gluten/backends-velox/generated-native-benchmark/example_lineitem/part-00000-3ec19189-d20e-4240-85ae-88631d46b612-c000.snappy.parquet
\
+--plan
/home/sparkuser/github/apache/incubator-gluten/backends-velox/generated-native-benchmark/example.json
\
+--data
/home/sparkuser/github/apache/incubator-gluten/backends-velox/generated-native-benchmark/example_orders/part-00000-1e66fb98-4dd6-47a6-8679-8625dbc437ee-c000.snappy.parquet,\
+/home/sparkuser/github/apache/incubator-gluten/backends-velox/generated-native-benchmark/example_lineitem/part-00000-3ec19189-d20e-4240-85ae-88631d46b612-c000.snappy.parquet
\
--threads 1 --iterations 1 --noprint-result
--benchmark_filter=InputFromBatchStream
```
diff --git a/docs/developers/NewToGluten.md b/docs/developers/NewToGluten.md
index 2cf67dcf6..04074d4e6 100644
--- a/docs/developers/NewToGluten.md
+++ b/docs/developers/NewToGluten.md
@@ -360,7 +360,7 @@ wait to attach....
# Run TPC-H and TPC-DS
We supply `<gluten_home>/tools/gluten-it` to execute these queries
-Refer to
[velox_be.yml](https://github.com/oap-project/gluten/blob/main/.github/workflows/velox_be.yml)
+Refer to
[velox_be.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_be.yml)
# Run gluten+velox on clean machine
@@ -371,7 +371,7 @@ spark-shell --name run_gluten \
--conf spark.plugins=io.glutenproject.GlutenPlugin \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=20g \
- --jars
https://github.com/oap-project/gluten/releases/download/v1.0.0/gluten-velox-bundle-spark3.2_2.12-ubuntu_20.04_x86_64-1.0.0.jar
\
+ --jars
https://github.com/apache/incubator-gluten/releases/download/v1.0.0/gluten-velox-bundle-spark3.2_2.12-ubuntu_20.04_x86_64-1.0.0.jar
\
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
```
diff --git a/docs/developers/SubstraitModifications.md
b/docs/developers/SubstraitModifications.md
index 1d97d58c7..a0080aa8a 100644
--- a/docs/developers/SubstraitModifications.md
+++ b/docs/developers/SubstraitModifications.md
@@ -17,19 +17,19 @@ alternatives like `AdvancedExtension` could be considered.
## Modifications to algebra.proto
-* Added `JsonReadOptions` and `TextReadOptions` in
`FileOrFiles`([#1584](https://github.com/oap-project/gluten/pull/1584)).
-* Changed join type `JOIN_TYPE_SEMI` to `JOIN_TYPE_LEFT_SEMI` and
`JOIN_TYPE_RIGHT_SEMI`([#408](https://github.com/oap-project/gluten/pull/408)).
+* Added `JsonReadOptions` and `TextReadOptions` in
`FileOrFiles`([#1584](https://github.com/apache/incubator-gluten/pull/1584)).
+* Changed join type `JOIN_TYPE_SEMI` to `JOIN_TYPE_LEFT_SEMI` and
`JOIN_TYPE_RIGHT_SEMI`([#408](https://github.com/apache/incubator-gluten/pull/408)).
* Added `WindowRel`, added `column_name` and `window_type` in `WindowFunction`,
-changed `Unbounded` in `WindowFunction` into `Unbounded_Preceding` and
`Unbounded_Following`, and added
WindowType([#485](https://github.com/oap-project/gluten/pull/485)).
-* Added `output_schema` in
RelRoot([#1901](https://github.com/oap-project/gluten/pull/1901)).
-* Added `ExpandRel`([#1361](https://github.com/oap-project/gluten/pull/1361)).
-* Added `GenerateRel`([#574](https://github.com/oap-project/gluten/pull/574)).
-* Added `PartitionColumn` in
`LocalFiles`([#2405](https://github.com/oap-project/gluten/pull/2405)).
-* Added `WriteRel` ([#3690](https://github.com/oap-project/gluten/pull/3690)).
+changed `Unbounded` in `WindowFunction` into `Unbounded_Preceding` and
`Unbounded_Following`, and added
WindowType([#485](https://github.com/apache/incubator-gluten/pull/485)).
+* Added `output_schema` in
RelRoot([#1901](https://github.com/apache/incubator-gluten/pull/1901)).
+* Added
`ExpandRel`([#1361](https://github.com/apache/incubator-gluten/pull/1361)).
+* Added
`GenerateRel`([#574](https://github.com/apache/incubator-gluten/pull/574)).
+* Added `PartitionColumn` in
`LocalFiles`([#2405](https://github.com/apache/incubator-gluten/pull/2405)).
+* Added `WriteRel`
([#3690](https://github.com/apache/incubator-gluten/pull/3690)).
## Modifications to type.proto
-* Added `Nothing` in
`Type`([#791](https://github.com/oap-project/gluten/pull/791)).
-* Added `names` in
`Struct`([#1878](https://github.com/oap-project/gluten/pull/1878)).
-* Added `PartitionColumns` in
`NamedStruct`([#320](https://github.com/oap-project/gluten/pull/320)).
-* Remove `PartitionColumns` and add `column_types` in
`NamedStruct`([#2405](https://github.com/oap-project/gluten/pull/2405)).
+* Added `Nothing` in
`Type`([#791](https://github.com/apache/incubator-gluten/pull/791)).
+* Added `names` in
`Struct`([#1878](https://github.com/apache/incubator-gluten/pull/1878)).
+* Added `PartitionColumns` in
`NamedStruct`([#320](https://github.com/apache/incubator-gluten/pull/320)).
+* Remove `PartitionColumns` and add `column_types` in
`NamedStruct`([#2405](https://github.com/apache/incubator-gluten/pull/2405)).
diff --git a/docs/developers/docker_centos7.md
b/docs/developers/docker_centos7.md
index 6ecc38c4c..2594a8d1f 100644
--- a/docs/developers/docker_centos7.md
+++ b/docs/developers/docker_centos7.md
@@ -43,7 +43,7 @@ ln -s /usr/bin/cmake3 /usr/local/bin/cmake
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$JAVA_HOME/bin:$PATH
-git clone https://github.com/oap-project/gluten.git
+git clone https://github.com/apache/incubator-gluten.git
cd gluten
# To access HDFS or S3, you need to add the parameters `--enable_hdfs=ON` and
`--enable_s3=ON`
diff --git a/docs/developers/docker_centos8.md
b/docs/developers/docker_centos8.md
index 530ef2957..dd94413bf 100755
--- a/docs/developers/docker_centos8.md
+++ b/docs/developers/docker_centos8.md
@@ -41,7 +41,7 @@ mv apache-maven-3.8.8 /usr/lib/maven
export MAVEN_HOME=/usr/lib/maven
export PATH=${PATH}:${MAVEN_HOME}/bin
-git clone https://github.com/oap-project/gluten.git
+git clone https://github.com/apache/incubator-gluten.git
cd gluten
# To access HDFS or S3, you need to add the parameters `--enable_hdfs=ON` and
`--enable_s3=ON`
diff --git a/docs/developers/docker_ubuntu22.04.md
b/docs/developers/docker_ubuntu22.04.md
index e1c03b45f..478e2f792 100644
--- a/docs/developers/docker_ubuntu22.04.md
+++ b/docs/developers/docker_ubuntu22.04.md
@@ -45,7 +45,7 @@ dpkg --configure -a
#export https_proxy=xxxx
#clone gluten
-git clone https://github.com/oap-project/gluten.git
+git clone https://github.com/apache/incubator-gluten.git
cd gluten/
#config maven proxy
diff --git a/docs/get-started/ClickHouse.md b/docs/get-started/ClickHouse.md
index 143cfb251..cbf9e44b2 100644
--- a/docs/get-started/ClickHouse.md
+++ b/docs/get-started/ClickHouse.md
@@ -43,7 +43,7 @@ You need to install the following software manually:
Then, get Gluten code:
```
- git clone https://github.com/oap-project/gluten.git
+ git clone https://github.com/apache/incubator-gluten.git
```
#### Setup ClickHouse backend development environment
@@ -105,7 +105,7 @@ Otherwise, do:
In case you don't want a develop environment, you can use the following
command to compile ClickHouse backend directly:
```
-git clone https://github.com/oap-project/gluten.git
+git clone https://github.com/apache/incubator-gluten.git
cd gluten
bash ./ep/build-clickhouse/src/build_clickhouse.sh
```
@@ -122,7 +122,7 @@ The prerequisites are the same as the one mentioned above.
Compile Gluten with C
- for Spark 3.2.2<span id="deploy-spark-322"></span>
```
- git clone https://github.com/oap-project/gluten.git
+ git clone https://github.com/apache/incubator-gluten.git
cd gluten/
export MAVEN_OPTS="-Xmx8g -XX:ReservedCodeCacheSize=2g"
mvn clean install -Pbackends-clickhouse -Phadoop-2.7.4 -Pspark-3.2
-Dhadoop.version=2.8.5 -DskipTests -Dcheckstyle.skip
@@ -132,7 +132,7 @@ The prerequisites are the same as the one mentioned above.
Compile Gluten with C
- for Spark 3.3.1
```
- git clone https://github.com/oap-project/gluten.git
+ git clone https://github.com/apache/incubator-gluten.git
cd gluten/
export MAVEN_OPTS="-Xmx8g -XX:ReservedCodeCacheSize=2g"
mvn clean install -Pbackends-clickhouse -Phadoop-2.7.4 -Pspark-3.3
-Dhadoop.version=2.8.5 -DskipTests -Dcheckstyle.skip
diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index be6c00a54..79ea501da 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -50,7 +50,7 @@ export PATH=$JAVA_HOME/bin:$PATH
## config maven, like proxy in ~/.m2/settings.xml
## fetch gluten code
-git clone https://github.com/oap-project/gluten.git
+git clone https://github.com/apache/incubator-gluten.git
```
# Build Gluten with Velox Backend
@@ -152,7 +152,7 @@ cp /path/to/hdfs-client.xml hdfs-client.xml
One typical deployment on Spark/HDFS cluster is to enable [short-circuit
reading](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).
Short-circuit reads provide a substantial performance boost to many
applications.
-By default libhdfs3 does not set the default hdfs domain socket path to
support HDFS short-circuit read. If this feature is required in HDFS setup,
users may need to setup the domain socket path correctly by patching the
libhdfs3 source code or by setting the correct config environment. In Gluten
the short-circuit domain socket path is set to "/var/lib/hadoop-hdfs/dn_socket"
in
[build_velox.sh](https://github.com/oap-project/gluten/blob/main/ep/build-velox/src/build_velox.sh)
So we need [...]
+By default libhdfs3 does not set the default hdfs domain socket path to
support HDFS short-circuit read. If this feature is required in HDFS setup,
users may need to setup the domain socket path correctly by patching the
libhdfs3 source code or by setting the correct config environment. In Gluten
the short-circuit domain socket path is set to "/var/lib/hadoop-hdfs/dn_socket"
in
[build_velox.sh](https://github.com/apache/incubator-gluten/blob/main/ep/build-velox/src/build_velox.sh)
So we [...]
```
sudo mkdir -p /var/lib/hadoop-hdfs/
@@ -299,7 +299,7 @@ Spark3.3 has 387 functions in total. ~240 are commonly
used. Velox's functions h
To identify what can be offloaded in a query and detailed fallback reasons,
user can follow below steps to retrieve corresponding logs.
```
-1) Enable Gluten by proper
[configuration](https://github.com/oap-project/gluten/blob/main/docs/Configuration.md).
+1) Enable Gluten by proper
[configuration](https://github.com/apache/incubator-gluten/blob/main/docs/Configuration.md).
2) Disable Spark AQE to trigger plan validation in Gluten
spark.sql.adaptive.enabled = false
diff --git a/docs/index.md b/docs/index.md
index 9e5cc243a..fc66717bc 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -36,7 +36,7 @@ The basic rule of Gluten's design is that we would reuse
spark's whole control f
## 1.3 Target User
Gluten's target user is anyone who wants to accelerate SparkSQL fundamentally.
As a plugin to Spark, Gluten doesn't require any change for dataframe API or
SQL query, but only requires user to make correct configuration.
-See Gluten configuration properties
[here](https://github.com/oap-project/gluten/blob/main/docs/Configuration.md).
+See Gluten configuration properties
[here](https://github.com/apache/incubator-gluten/blob/main/docs/Configuration.md).
## 1.4 References
diff --git a/docs/release.md b/docs/release.md
index f8930ae43..a3f20bde8 100644
--- a/docs/release.md
+++ b/docs/release.md
@@ -4,11 +4,12 @@ title: Gluten Release
nav_order: 11
---
-[Gluten](https://github.com/oap-project/gluten) is a plugin for Apache Spark
to double SparkSQL's performance.
+[Gluten](https://github.com/apache/incubator-gluten) is a plugin for Apache
Spark to double SparkSQL's performance.
## Latest release for velox backend
-* [Gluten-1.1.0](https://github.com/oap-project/gluten/releases/tag/v1.1.0)
(Nov. 30 2023)
+*
[Gluten-1.1.1](https://github.com/apache/incubator-gluten/releases/tag/v1.1.1)
(Mar. 2 2024)
## Archived releases
-* [Gluten-1.0.0](https://github.com/oap-project/gluten/releases/tag/v1.0.0)
(Jul. 13 2023)
-* [Gluten-0.5.0](https://github.com/oap-project/gluten/releases/tag/0.5.0)
(Apr. 7 2023).
+*
[Gluten-1.1.0](https://github.com/apache/incubator-gluten/releases/tag/v1.1.0)
(Nov. 30 2023)
+*
[Gluten-1.0.0](https://github.com/apache/incubator-gluten/releases/tag/v1.0.0)
(Jul. 13 2023)
+*
[Gluten-0.5.0](https://github.com/apache/incubator-gluten/releases/tag/0.5.0)
(Apr. 7 2023)
diff --git a/docs/velox-backend-limitations.md
b/docs/velox-backend-limitations.md
index fb3b0f16f..7b03f3b2f 100644
--- a/docs/velox-backend-limitations.md
+++ b/docs/velox-backend-limitations.md
@@ -9,9 +9,9 @@ must fall back to vanilla spark, etc.
### Override of Spark classes (For Spark3.2 and Spark3.3)
Gluten avoids to modify Spark's existing code and use Spark APIs if possible.
However, some APIs aren't exposed in Vanilla spark and we have to copy the
Spark file and do the hardcode changes. The list of override classes can be
found as ignoreClasses in package/pom.xml . If you use customized Spark, you
may check if the files are modified in your spark, otherwise your changes will
be overrided.
-So you need to ensure preferentially load the Gluten jar to overwrite the jar
of vanilla spark. Refer to [How to prioritize loading Gluten jars in
Spark](https://github.com/oap-project/gluten/blob/main/docs/velox-backend-troubleshooting.md#incompatible-class-error-when-using-native-writer).
+So you need to ensure preferentially load the Gluten jar to overwrite the jar
of vanilla spark. Refer to [How to prioritize loading Gluten jars in
Spark](https://github.com/apache/incubator-gluten/blob/main/docs/velox-backend-troubleshooting.md#incompatible-class-error-when-using-native-writer).
-If not officially supported spark3.2/3.3 version is used, NoSuchMethodError
can be thrown at runtime. More details see
[issue-4514](https://github.com/oap-project/gluten/issues/4514).
+If not officially supported spark3.2/3.3 version is used, NoSuchMethodError
can be thrown at runtime. More details see
[issue-4514](https://github.com/apache/incubator-gluten/issues/4514).
### Fallbacks
Except the unsupported operators, functions, file formats, data sources listed
in , there are some known cases also fall back to Vanilla Spark.
diff --git
a/gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala
b/gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala
index ff231585b..e3478720a 100644
---
a/gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala
+++
b/gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala
@@ -135,7 +135,8 @@ object ColumnarOverrideRules {
case ColumnarToRowExec(DummyColumnarOutputExec(_)) => false
case _ =>
throw new IllegalStateException(
- "This should not happen. Please leave a issue at
https://github.com/oap-project/gluten.")
+ "This should not happen. Please leave a issue at" +
+ " https://github.com/apache/incubator-gluten.")
}
def unwrap(plan: SparkPlan): SparkPlan = plan match {
@@ -145,7 +146,8 @@ object ColumnarOverrideRules {
case ColumnarToRowExec(DummyColumnarOutputExec(child)) => child
case _ =>
throw new IllegalStateException(
- "This should not happen. Please leave a issue at
https://github.com/oap-project/gluten.")
+ "This should not happen. Please leave a issue at" +
+ " https://github.com/apache/incubator-gluten.")
}
}
}
diff --git a/mkdocs.yml b/mkdocs.yml
index bdf856e07..1c03a1ce6 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -15,7 +15,7 @@
site_name: Gluten
repo_name: 'Fork on GitHub '
-repo_url: "https://github.com/oap-project/gluten.git"
+repo_url: "https://github.com/apache/incubator-gluten.git"
edit_uri: ""
diff --git a/pom.xml b/pom.xml
index 379d5434b..be1a3544d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -20,7 +20,7 @@
<packaging>pom</packaging>
<name>Gluten Parent Pom</name>
- <url>https://github.com/oap-project/gluten.git</url>
+ <url>https://github.com/apache/incubator-gluten.git</url>
<licenses>
<license>
diff --git a/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala
b/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala
index bba16aa8b..87f5fc4f0 100644
--- a/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala
+++ b/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala
@@ -1602,28 +1602,28 @@ object GlutenConfig {
val DIRECTORY_SIZE_GUESS =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.directorySizeGuess")
.internal()
- .doc(" Set the directory size guess for velox file scan")
+ .doc("Set the directory size guess for velox file scan")
.intConf
.createOptional
val FILE_PRELOAD_THRESHOLD =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.filePreloadThreshold")
.internal()
- .doc(" Set the file preload threshold for velox file scan")
+ .doc("Set the file preload threshold for velox file scan")
.intConf
.createOptional
val PREFETCH_ROW_GROUPS =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.prefetchRowGroups")
.internal()
- .doc(" Set the prefetch row groups for velox file scan")
+ .doc("Set the prefetch row groups for velox file scan")
.intConf
.createOptional
val LOAD_QUANTUM =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.loadQuantum")
.internal()
- .doc(" Set the load quantum for velox file scan")
+ .doc("Set the load quantum for velox file scan")
.intConf
.createOptional
@@ -1637,14 +1637,14 @@ object GlutenConfig {
val MAX_COALESCED_BYTES =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.maxCoalescedBytes")
.internal()
- .doc(" Set the max coalesced bytes for velox file scan")
+ .doc("Set the max coalesced bytes for velox file scan")
.intConf
.createOptional
val CACHE_PREFETCH_MINPCT =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.cachePrefetchMinPct")
.internal()
- .doc(" Set prefetch cache min pct for velox file scan")
+ .doc("Set prefetch cache min pct for velox file scan")
.intConf
.createOptional
@@ -1658,7 +1658,7 @@ object GlutenConfig {
val VELOX_ORC_SCAN_ENABLED =
buildStaticConf("spark.gluten.sql.columnar.backend.velox.orc.scan.enabled")
.internal()
- .doc(" Enable velox orc scan. If disabled, vanilla spark orc scan will
be used.")
+ .doc("Enable velox orc scan. If disabled, vanilla spark orc scan will be
used.")
.booleanConf
.createWithDefault(true)
diff --git a/tools/gluten-it/README.md b/tools/gluten-it/README.md
index 39e061702..59ae55e14 100644
--- a/tools/gluten-it/README.md
+++ b/tools/gluten-it/README.md
@@ -6,13 +6,13 @@ The project makes it easy to test Gluten build locally.
Gluten is a native Spark SQL implementation as a standard Spark plug-in.
-https://github.com/oap-project/gluten
+https://github.com/apache/incubator-gluten
## Getting Started
### 1. Install Gluten in your local machine
-See official Gluten build guidance
https://github.com/oap-project/gluten#how-to-use-gluten
+See official Gluten build guidance
https://github.com/apache/incubator-gluten#how-to-use-gluten
### 2. Install and run gluten-it with Spark version
diff --git a/tools/gluten-te/centos/defaults.conf
b/tools/gluten-te/centos/defaults.conf
index 19e1b238a..1213ff66d 100755
--- a/tools/gluten-te/centos/defaults.conf
+++ b/tools/gluten-te/centos/defaults.conf
@@ -11,7 +11,7 @@ DEFAULT_NON_INTERACTIVE=OFF
DEFAULT_PRESERVE_CONTAINER=OFF
# The codes will be used in build
-DEFAULT_GLUTEN_REPO=https://github.com/oap-project/gluten.git
+DEFAULT_GLUTEN_REPO=https://github.com/apache/incubator-gluten.git
DEFAULT_GLUTEN_BRANCH=main
# Create debug build
diff --git a/tools/gluten-te/ubuntu/README.md b/tools/gluten-te/ubuntu/README.md
index 328b5108e..f617d8368 100644
--- a/tools/gluten-te/ubuntu/README.md
+++ b/tools/gluten-te/ubuntu/README.md
@@ -1,6 +1,6 @@
# Portable Test Environment of Gluten (gluten-te)
-Build and run [gluten](https://github.com/oap-project/gluten) and
[gluten-it](https://github.com/oap-project/gluten/tree/main/tools/gluten-it) in
a portable docker container, from scratch.
+Build and run [gluten](https://github.com/apache/incubator-gluten) and
[gluten-it](https://github.com/apache/incubator-gluten/tree/main/tools/gluten-it)
in a portable docker container, from scratch.
# Prerequisites
@@ -9,7 +9,7 @@ Only Linux and MacOS are currently supported. Before running
the scripts, make s
# Getting Started (Build Gluten code, Velox backend)
```sh
-git clone -b main https://github.com/oap-project/gluten.git gluten # Gluten
main code
+git clone -b main https://github.com/apache/incubator-gluten.git gluten #
Gluten main code
export HTTP_PROXY_HOST=myproxy.example.com # in case you are behind http proxy
export HTTP_PROXY_PORT=55555 # in case you are behind http proxy
@@ -21,7 +21,7 @@ tools/gluten-te/ubuntu/examples/buildhere-veloxbe/run.sh
# Getting Started (TPC, Velox backend)
```sh
-git clone -b main https://github.com/oap-project/gluten.git gluten # Gluten
main code
+git clone -b main https://github.com/apache/incubator-gluten.git gluten #
Gluten main code
export HTTP_PROXY_HOST=myproxy.example.com # in case you are behind http proxy
export HTTP_PROXY_PORT=55555 # in case you are behind http proxy
@@ -32,7 +32,7 @@ cd gluten/gluten-te
# Configurations
-See the [config
file](https://github.com/oap-project/gluten/blob/main/tools/gluten-te/ubuntu/defaults.conf).
You can modify the file to configure gluten-te, or pass env variables during
running the scripts.
+See the [config
file](https://github.com/apache/incubator-gluten/blob/main/tools/gluten-te/ubuntu/defaults.conf).
You can modify the file to configure gluten-te, or pass env variables during
running the scripts.
# Example Usages
diff --git a/tools/gluten-te/ubuntu/defaults.conf
b/tools/gluten-te/ubuntu/defaults.conf
index 4f4904ad6..2656b1cfa 100644
--- a/tools/gluten-te/ubuntu/defaults.conf
+++ b/tools/gluten-te/ubuntu/defaults.conf
@@ -11,7 +11,7 @@ DEFAULT_NON_INTERACTIVE=OFF
DEFAULT_PRESERVE_CONTAINER=OFF
# The codes will be used in build
-DEFAULT_GLUTEN_REPO=https://github.com/oap-project/gluten.git
+DEFAULT_GLUTEN_REPO=https://github.com/apache/incubator-gluten.git
DEFAULT_GLUTEN_BRANCH=main
# Create debug build
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]