yuqi1129 commented on code in PR #8450: URL: https://github.com/apache/gravitino/pull/8450#discussion_r2330261117
########## docs/manage-statistics-in-gravitino.md: ########## @@ -245,13 +245,16 @@ For example, if you set an extra property `foo` to `bar` for Lance storage optio For Lance remote storage, you can refer to the document [here](https://lancedb.github.io/lance/usage/storage/). -| Configuration item | Description | Default value | Required | Since version | -|-----------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------| -| `gravitino.stats.partition.storageOption.location` | The location of Lance files | `${GRAVITINO_HOME}/data/lance` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxRowsPerFile` | The maximum rows per file | `1000000` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxBytesPerFile` | The maximum bytes per file | `104857600` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxRowsPerGroup` | The maximum rows per group | `1000000` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.readBatchSize` | The batch record number when reading | `10000` | No | 1.0.0 | +| Configuration item | Description | Default value | Required | Since version | +|----------------------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------| Review Comment: Please reduce the number of `--` in column `Default value` ########## core/src/main/java/org/apache/gravitino/stats/storage/LancePartitionStatisticStorage.java: ########## @@ -133,7 +153,45 @@ public LancePartitionStatisticStorage(Map<String, String> properties) { properties.getOrDefault(READ_BATCH_SIZE, String.valueOf(DEFAULT_READ_BATCH_SIZE))); Preconditions.checkArgument( readBatchSize > 0, "Lance partition statistics storage readBatchSize must be positive"); + int datasetCacheSize = + Integer.parseInt( + properties.getOrDefault( + DATASET_CACHE_SIZE, String.valueOf(DEFAULT_DATASET_CACHE_SIZE))); + Preconditions.checkArgument( + datasetCacheSize > 0, + "Lance partition statistics storage datasetCacheSize must be positive"); + this.metadataFileCacheSize = + Long.parseLong( + properties.getOrDefault( + METADATA_FILE_CACHE_SIZE, String.valueOf(DEFAULT_METADATA_FILE_CACHE_SIZE))); + Preconditions.checkArgument( + metadataFileCacheSize > 0, + "Lance partition statistics storage metadataFileCacheSizeBytes must be positive"); + this.indexCacheSize = + Long.parseLong( + properties.getOrDefault(INDEX_CACHE_SIZE, String.valueOf(DEFAULT_INDEX_CACHE_SIZE))); + Preconditions.checkArgument( + indexCacheSize > 0, + "Lance partition statistics storage indexCacheSizeBytes must be positive"); + this.properties = properties; + + this.cache = + Caffeine.newBuilder() + .maximumSize(datasetCacheSize) Review Comment: Will all datasets be cached, and will there be no expiration time associated with them? I'm not very sure that whether 10000 file handler is a large number or not. ########## docs/manage-statistics-in-gravitino.md: ########## @@ -245,13 +245,16 @@ For example, if you set an extra property `foo` to `bar` for Lance storage optio For Lance remote storage, you can refer to the document [here](https://lancedb.github.io/lance/usage/storage/). -| Configuration item | Description | Default value | Required | Since version | -|-----------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------| -| `gravitino.stats.partition.storageOption.location` | The location of Lance files | `${GRAVITINO_HOME}/data/lance` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxRowsPerFile` | The maximum rows per file | `1000000` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxBytesPerFile` | The maximum bytes per file | `104857600` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.maxRowsPerGroup` | The maximum rows per group | `1000000` | No | 1.0.0 | -| `gravitino.stats.partition.storageOption.readBatchSize` | The batch record number when reading | `10000` | No | 1.0.0 | +| Configuration item | Description | Default value | Required | Since version | +|----------------------------------------------------------------------|--------------------------------------|--------------------------------|-------------------------------------------------|---------------| +| `gravitino.stats.partition.storageOption.location` | The location of Lance files | `${GRAVITINO_HOME}/data/lance` | No | 1.0.0 | +| `gravitino.stats.partition.storageOption.maxRowsPerFile` | The maximum rows per file | `1000000` | No | 1.0.0 | +| `gravitino.stats.partition.storageOption.maxBytesPerFile` | The maximum bytes per file | `104857600` | No | 1.0.0 | +| `gravitino.stats.partition.storageOption.maxRowsPerGroup` | The maximum rows per group | `1000000` | No | 1.0.0 | +| `gravitino.stats.partition.storageOption.readBatchSize` | The batch record number when reading | `10000` | No | 1.0.0 | +| `gravitino.stats.partition.storageOption.datasetCacheSize` | The dataset of Lance cache size | `10000` | No | 1.0.0 | Review Comment: The dataset of Lance cache size -> `size of dataset cache for Lance` ########## gradle/libs.versions.toml: ########## @@ -29,7 +29,7 @@ guava = "32.1.3-jre" lombok = "1.18.20" slf4j = "2.0.9" log4j = "2.24.3" -lance = "0.31.0" +lance = "0.34.0" Review Comment: Was Lance involved so quickly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
