yuqi1129 commented on code in PR #8450:
URL: https://github.com/apache/gravitino/pull/8450#discussion_r2330261117


##########
docs/manage-statistics-in-gravitino.md:
##########
@@ -245,13 +245,16 @@ For example, if you set an extra property `foo` to `bar` 
for Lance storage optio
 For Lance remote storage, you can refer to the document 
[here](https://lancedb.github.io/lance/usage/storage/).
 
 
-| Configuration item                                        | Description      
                    | Default value                  | Required                 
                       | Since version |
-|-----------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------|
-| `gravitino.stats.partition.storageOption.location`        | The location of 
Lance files          | `${GRAVITINO_HOME}/data/lance` | No                      
                        | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerFile`  | The maximum rows 
per file            | `1000000`                      | No                       
                       | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxBytesPerFile` | The maximum 
bytes per file           | `104857600`                    | No                  
                            | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerGroup` | The maximum rows 
per group           | `1000000`                      | No                       
                       | 1.0.0         |
-| `gravitino.stats.partition.storageOption.readBatchSize`   | The batch record 
number when reading | `10000`                        | No                       
                       | 1.0.0         |
+| Configuration item                                                   | 
Description                          | Default value                  | 
Required                                        | Since version |
+|----------------------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------|

Review Comment:
   Please reduce the number of `--` in column `Default value`



##########
core/src/main/java/org/apache/gravitino/stats/storage/LancePartitionStatisticStorage.java:
##########
@@ -133,7 +153,45 @@ public LancePartitionStatisticStorage(Map<String, String> 
properties) {
             properties.getOrDefault(READ_BATCH_SIZE, 
String.valueOf(DEFAULT_READ_BATCH_SIZE)));
     Preconditions.checkArgument(
         readBatchSize > 0, "Lance partition statistics storage readBatchSize 
must be positive");
+    int datasetCacheSize =
+        Integer.parseInt(
+            properties.getOrDefault(
+                DATASET_CACHE_SIZE, 
String.valueOf(DEFAULT_DATASET_CACHE_SIZE)));
+    Preconditions.checkArgument(
+        datasetCacheSize > 0,
+        "Lance partition statistics storage datasetCacheSize must be 
positive");
+    this.metadataFileCacheSize =
+        Long.parseLong(
+            properties.getOrDefault(
+                METADATA_FILE_CACHE_SIZE, 
String.valueOf(DEFAULT_METADATA_FILE_CACHE_SIZE)));
+    Preconditions.checkArgument(
+        metadataFileCacheSize > 0,
+        "Lance partition statistics storage metadataFileCacheSizeBytes must be 
positive");
+    this.indexCacheSize =
+        Long.parseLong(
+            properties.getOrDefault(INDEX_CACHE_SIZE, 
String.valueOf(DEFAULT_INDEX_CACHE_SIZE)));
+    Preconditions.checkArgument(
+        indexCacheSize > 0,
+        "Lance partition statistics storage indexCacheSizeBytes must be 
positive");
+
     this.properties = properties;
+
+    this.cache =
+        Caffeine.newBuilder()
+            .maximumSize(datasetCacheSize)

Review Comment:
   Will all datasets be cached, and will there be no expiration time associated 
with them? I'm not very sure that whether 10000 file handler is a large number 
or not. 



##########
docs/manage-statistics-in-gravitino.md:
##########
@@ -245,13 +245,16 @@ For example, if you set an extra property `foo` to `bar` 
for Lance storage optio
 For Lance remote storage, you can refer to the document 
[here](https://lancedb.github.io/lance/usage/storage/).
 
 
-| Configuration item                                        | Description      
                    | Default value                  | Required                 
                       | Since version |
-|-----------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------|
-| `gravitino.stats.partition.storageOption.location`        | The location of 
Lance files          | `${GRAVITINO_HOME}/data/lance` | No                      
                        | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerFile`  | The maximum rows 
per file            | `1000000`                      | No                       
                       | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxBytesPerFile` | The maximum 
bytes per file           | `104857600`                    | No                  
                            | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerGroup` | The maximum rows 
per group           | `1000000`                      | No                       
                       | 1.0.0         |
-| `gravitino.stats.partition.storageOption.readBatchSize`   | The batch record 
number when reading | `10000`                        | No                       
                       | 1.0.0         |
+| Configuration item                                                   | 
Description                          | Default value                  | 
Required                                        | Since version |
+|----------------------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------|
+| `gravitino.stats.partition.storageOption.location`                   | The 
location of Lance files          | `${GRAVITINO_HOME}/data/lance` | No          
                                    | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxRowsPerFile`             | The 
maximum rows per file            | `1000000`                      | No          
                                    | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxBytesPerFile`            | The 
maximum bytes per file           | `104857600`                    | No          
                                    | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxRowsPerGroup`            | The 
maximum rows per group           | `1000000`                      | No          
                                    | 1.0.0         |
+| `gravitino.stats.partition.storageOption.readBatchSize`              | The 
batch record number when reading | `10000`                        | No          
                                    | 1.0.0         |
+| `gravitino.stats.partition.storageOption.datasetCacheSize`           | The 
dataset of Lance cache size      | `10000`                        | No          
                                    | 1.0.0         |

Review Comment:
   The dataset of Lance cache size -> `size of dataset cache for Lance`



##########
gradle/libs.versions.toml:
##########
@@ -29,7 +29,7 @@ guava = "32.1.3-jre"
 lombok = "1.18.20"
 slf4j = "2.0.9"
 log4j = "2.24.3"
-lance = "0.31.0"
+lance = "0.34.0"

Review Comment:
   Was Lance involved so quickly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to