(incubator-gluten) branch main updated: remove duplicate content for local cache part (#5535)

felixybw Thu, 25 Apr 2024 17:40:55 -0700

This is an automated email from the ASF dual-hosted git repository.

felixybw pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new c19175331 remove duplicate content for local cache part (#5535)
c19175331 is described below

commit c19175331469f1ff3c4223e5b0e354336322722b
Author: 高阳阳 <[email protected]>
AuthorDate: Fri Apr 26 08:40:45 2024 +0800

    remove duplicate content for local cache part (#5535)
    
     remove duplicate content from doc
---
 docs/get-started/VeloxABFS.md                         | 14 +-------------
 docs/get-started/{VeloxABFS.md => VeloxLocalCache.md} | 19 ++-----------------
 docs/get-started/VeloxS3.md                           | 14 +-------------
 3 files changed, 4 insertions(+), 43 deletions(-)

diff --git a/docs/get-started/VeloxABFS.md b/docs/get-started/VeloxABFS.md
index 9bb9c8332..6e0882423 100644
--- a/docs/get-started/VeloxABFS.md
+++ b/docs/get-started/VeloxABFS.md
@@ -20,16 +20,4 @@ 
spark.hadoop.fs.azure.account.key.<storage-account>.dfs.core.windows.net  XXXXXX
 
 # Local Caching support
 
-Velox supports a local cache when reading data from HDFS/S3/ABFS. With this 
feature, Velox can asynchronously cache the data on local disk when reading 
from remote storage and future read requests on previously cached blocks will 
be serviced from local cache files. To enable the local caching feature, the 
following configurations are required:
-
-```
-spark.gluten.sql.columnar.backend.velox.cacheEnabled      // enable or disable 
velox cache, default false.
-spark.gluten.sql.columnar.backend.velox.memCacheSize      // the total size of 
in-mem cache, default is 128MB.
-spark.gluten.sql.columnar.backend.velox.ssdCachePath      // the folder to 
store the cache files, default is "/tmp".
-spark.gluten.sql.columnar.backend.velox.ssdCacheSize      // the total size of 
the SSD cache, default is 128MB. Velox will do in-mem cache only if this value 
is 0.
-spark.gluten.sql.columnar.backend.velox.ssdCacheShards    // the shards of the 
SSD cache, default is 1.
-spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads 
for cache promoting, default is 1. Velox will try to do "read-ahead" if this 
value is bigger than 1 
-spark.gluten.sql.columnar.backend.velox.ssdODirect        // enable or disable 
O_DIRECT on cache write, default false.
-```
-
-It's recommended to mount SSDs to the cache path to get the best performance 
of local caching. Cache files will be written to 
"spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, 
e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten cannot reuse 
older caches for now, and the old cache files are left after Spark context 
shutdown.
+Velox supports a local cache when reading data from ABFS. Please refer [Velox 
Local Cache](VeloxLocalCache.md) part for more detailed configurations.
\ No newline at end of file
diff --git a/docs/get-started/VeloxABFS.md b/docs/get-started/VeloxLocalCache.md
similarity index 64%
copy from docs/get-started/VeloxABFS.md
copy to docs/get-started/VeloxLocalCache.md
index 9bb9c8332..1c7c40ced 100644
--- a/docs/get-started/VeloxABFS.md
+++ b/docs/get-started/VeloxLocalCache.md
@@ -1,24 +1,9 @@
 ---
 layout: page
-title: Using ABFS with Gluten
-nav_order: 6
+title: Velox Local Caching
+nav_order: 7
 parent: Getting-Started
 ---
-ABFS is an important data store for big data users. This doc discusses config 
details and use cases of Gluten with ABFS. To use an ABFS account as your data 
source, please ensure you use the listed ABFS config in your 
spark-defaults.conf. If you would like to authenticate with ABFS using 
additional auth mechanisms, please reach out using the 'Issues' tab.
-
-# Working with ABFS
-
-## Configuring ABFS Access Token
-
-To configure access to your storage account, replace <storage-account> with 
the name of your account. This property aligns with Spark configurations. By 
setting this config multiple times using different storage account names, you 
can access multiple ABFS accounts.
-
-```sh
-spark.hadoop.fs.azure.account.key.<storage-account>.dfs.core.windows.net  
XXXXXXXXX
-```
-
-### Other authentatication methods are not yet supported.
-
-# Local Caching support
 
 Velox supports a local cache when reading data from HDFS/S3/ABFS. With this 
feature, Velox can asynchronously cache the data on local disk when reading 
from remote storage and future read requests on previously cached blocks will 
be serviced from local cache files. To enable the local caching feature, the 
following configurations are required:
 
diff --git a/docs/get-started/VeloxS3.md b/docs/get-started/VeloxS3.md
index 2ece52b2f..c57bf6da6 100644
--- a/docs/get-started/VeloxS3.md
+++ b/docs/get-started/VeloxS3.md
@@ -58,16 +58,4 @@ You can change log granularity of AWS C++ SDK by setting the 
`spark.gluten.velox
 
 # Local Caching support
 
-Velox supports a local cache when reading data from HDFS/S3. The feature is 
very useful if remote storage is slow, e.g., reading from a public S3 bucket 
and stronger performance is desired. With this feature, Velox can 
asynchronously cache the data on local disk when reading from remote storage, 
and the future reading requests on already cached blocks will be serviced from 
local cache files. To enable the local caching feature, below configurations 
are required:
-
-```
-spark.gluten.sql.columnar.backend.velox.cacheEnabled      // enable or disable 
velox cache, default false.
-spark.gluten.sql.columnar.backend.velox.memCacheSize      // the total size of 
in-mem cache, default is 128MB.
-spark.gluten.sql.columnar.backend.velox.ssdCachePath      // the folder to 
store the cache files, default is "/tmp".
-spark.gluten.sql.columnar.backend.velox.ssdCacheSize      // the total size of 
the SSD cache, default is 128MB. Velox will do in-mem cache only if this value 
is 0.
-spark.gluten.sql.columnar.backend.velox.ssdCacheShards    // the shards of the 
SSD cache, default is 1.
-spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads 
for cache promoting, default is 1. Velox will try to do "read-ahead" if this 
value is bigger than 1 
-spark.gluten.sql.columnar.backend.velox.ssdODirect        // enable or disable 
O_DIRECT on cache write, default false.
-```
-
-It's recommended to mount SSDs to the cache path to get the best performance 
of local caching. On the start up of Spark context, the cache files will be 
allocated under "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID 
based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten 
is not able to reuse older caches for now, and the old cache files are left 
there after Spark context shutdown.
+Velox supports a local cache when reading data from S3. Please refer [Velox 
Local Cache](VeloxLocalCache.md) part for more detailed configurations.
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: remove duplicate content for local cache part (#5535)

Reply via email to