Xuanwo opened a new issue, #1226:
URL: https://github.com/apache/iceberg-rust/issues/1226
### What's the feature are you trying to implement?
Cache is an essential component of an Iceberg table, and different types of
cache are needed at various levels.
For example, for our table metadata, we will need a `Manifest` cache so that
we don't have to read and deserialize the same manifest files repeatedly. For
our Parquet files, we will need a `FileMetadata` cache to avoid parsing the
metadata from the Parquet files each time. We could even implement a raw data
cache to store portions of data files, eliminating the need to download them
from S3 again.
As the foundation for various query engines, iceberg-rust should be designed
to simplify integration while still allowing each engine to fully optimize
performance. This applies whether they are using iceberg-rust on a single
machine or within a distributed cluster.
I plan to add a set of cache APIs to meet all those needs. My current plan
is:
- `ObjectCache`: an object cache trait that can hold objects like `Manifest`
or `FileMetadata`
- `BytesCache`: a bytes cache that can hold row content of files, like
`table_metadata.json` files.
- In FileIO Cache like opendal's CacheLayer, but the API is not decided yet.
## Tasks
- ObjectCache
- [ ] https://github.com/apache/iceberg-rust/pull/1222
- [ ] https://github.com/apache/iceberg-rust/pull/1225
- BytesCache
- OpenDAL CacheLayer (TBD)
### Willingness to contribute
I can contribute to this feature independently
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]