This is an automated email from the ASF dual-hosted git repository.
felixybw pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new c19175331 remove duplicate content for local cache part (#5535)
c19175331 is described below
commit c19175331469f1ff3c4223e5b0e354336322722b
Author: 高阳阳 <[email protected]>
AuthorDate: Fri Apr 26 08:40:45 2024 +0800
remove duplicate content for local cache part (#5535)
remove duplicate content from doc
---
docs/get-started/VeloxABFS.md | 14 +-------------
docs/get-started/{VeloxABFS.md => VeloxLocalCache.md} | 19 ++-----------------
docs/get-started/VeloxS3.md | 14 +-------------
3 files changed, 4 insertions(+), 43 deletions(-)
diff --git a/docs/get-started/VeloxABFS.md b/docs/get-started/VeloxABFS.md
index 9bb9c8332..6e0882423 100644
--- a/docs/get-started/VeloxABFS.md
+++ b/docs/get-started/VeloxABFS.md
@@ -20,16 +20,4 @@
spark.hadoop.fs.azure.account.key.<storage-account>.dfs.core.windows.net XXXXXX
# Local Caching support
-Velox supports a local cache when reading data from HDFS/S3/ABFS. With this
feature, Velox can asynchronously cache the data on local disk when reading
from remote storage and future read requests on previously cached blocks will
be serviced from local cache files. To enable the local caching feature, the
following configurations are required:
-
-```
-spark.gluten.sql.columnar.backend.velox.cacheEnabled // enable or disable
velox cache, default false.
-spark.gluten.sql.columnar.backend.velox.memCacheSize // the total size of
in-mem cache, default is 128MB.
-spark.gluten.sql.columnar.backend.velox.ssdCachePath // the folder to
store the cache files, default is "/tmp".
-spark.gluten.sql.columnar.backend.velox.ssdCacheSize // the total size of
the SSD cache, default is 128MB. Velox will do in-mem cache only if this value
is 0.
-spark.gluten.sql.columnar.backend.velox.ssdCacheShards // the shards of the
SSD cache, default is 1.
-spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads
for cache promoting, default is 1. Velox will try to do "read-ahead" if this
value is bigger than 1
-spark.gluten.sql.columnar.backend.velox.ssdODirect // enable or disable
O_DIRECT on cache write, default false.
-```
-
-It's recommended to mount SSDs to the cache path to get the best performance
of local caching. Cache files will be written to
"spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix,
e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten cannot reuse
older caches for now, and the old cache files are left after Spark context
shutdown.
+Velox supports a local cache when reading data from ABFS. Please refer [Velox
Local Cache](VeloxLocalCache.md) part for more detailed configurations.
\ No newline at end of file
diff --git a/docs/get-started/VeloxABFS.md b/docs/get-started/VeloxLocalCache.md
similarity index 64%
copy from docs/get-started/VeloxABFS.md
copy to docs/get-started/VeloxLocalCache.md
index 9bb9c8332..1c7c40ced 100644
--- a/docs/get-started/VeloxABFS.md
+++ b/docs/get-started/VeloxLocalCache.md
@@ -1,24 +1,9 @@
---
layout: page
-title: Using ABFS with Gluten
-nav_order: 6
+title: Velox Local Caching
+nav_order: 7
parent: Getting-Started
---
-ABFS is an important data store for big data users. This doc discusses config
details and use cases of Gluten with ABFS. To use an ABFS account as your data
source, please ensure you use the listed ABFS config in your
spark-defaults.conf. If you would like to authenticate with ABFS using
additional auth mechanisms, please reach out using the 'Issues' tab.
-
-# Working with ABFS
-
-## Configuring ABFS Access Token
-
-To configure access to your storage account, replace <storage-account> with
the name of your account. This property aligns with Spark configurations. By
setting this config multiple times using different storage account names, you
can access multiple ABFS accounts.
-
-```sh
-spark.hadoop.fs.azure.account.key.<storage-account>.dfs.core.windows.net
XXXXXXXXX
-```
-
-### Other authentatication methods are not yet supported.
-
-# Local Caching support
Velox supports a local cache when reading data from HDFS/S3/ABFS. With this
feature, Velox can asynchronously cache the data on local disk when reading
from remote storage and future read requests on previously cached blocks will
be serviced from local cache files. To enable the local caching feature, the
following configurations are required:
diff --git a/docs/get-started/VeloxS3.md b/docs/get-started/VeloxS3.md
index 2ece52b2f..c57bf6da6 100644
--- a/docs/get-started/VeloxS3.md
+++ b/docs/get-started/VeloxS3.md
@@ -58,16 +58,4 @@ You can change log granularity of AWS C++ SDK by setting the
`spark.gluten.velox
# Local Caching support
-Velox supports a local cache when reading data from HDFS/S3. The feature is
very useful if remote storage is slow, e.g., reading from a public S3 bucket
and stronger performance is desired. With this feature, Velox can
asynchronously cache the data on local disk when reading from remote storage,
and the future reading requests on already cached blocks will be serviced from
local cache files. To enable the local caching feature, below configurations
are required:
-
-```
-spark.gluten.sql.columnar.backend.velox.cacheEnabled // enable or disable
velox cache, default false.
-spark.gluten.sql.columnar.backend.velox.memCacheSize // the total size of
in-mem cache, default is 128MB.
-spark.gluten.sql.columnar.backend.velox.ssdCachePath // the folder to
store the cache files, default is "/tmp".
-spark.gluten.sql.columnar.backend.velox.ssdCacheSize // the total size of
the SSD cache, default is 128MB. Velox will do in-mem cache only if this value
is 0.
-spark.gluten.sql.columnar.backend.velox.ssdCacheShards // the shards of the
SSD cache, default is 1.
-spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads
for cache promoting, default is 1. Velox will try to do "read-ahead" if this
value is bigger than 1
-spark.gluten.sql.columnar.backend.velox.ssdODirect // enable or disable
O_DIRECT on cache write, default false.
-```
-
-It's recommended to mount SSDs to the cache path to get the best performance
of local caching. On the start up of Spark context, the cache files will be
allocated under "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID
based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten
is not able to reuse older caches for now, and the old cache files are left
there after Spark context shutdown.
+Velox supports a local cache when reading data from S3. Please refer [Velox
Local Cache](VeloxLocalCache.md) part for more detailed configurations.
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]