purefunctions commented on issue #5122:
URL: https://github.com/apache/iceberg/issues/5122#issuecomment-1254395202

   @JanKaul Just got linked to this conversation from apache arrow discord 
chat! I just (today) created my own project 
https://github.com/trust-in-rust/rustberg with a modest goal of support a small 
subset of iceberg operations through the datafusion project. The iceberg-rs 
library that you linked was the first one that I saw, but it only supports 
metadata files for now, like you discovered. I also found 
https://github.com/joshuarobinson/rust_iceberg which actually reads more than 
metadata and supports a query through datafusion. However, it doesn't support 
the partition/schema evolution based split planning that one would have to do 
when using iceberg and evolving tables.
   
   @openinx just like @snth We also have a very small scale data lake based on 
iceberg on-premises. We use iceberg to make it easier to migrate the lake to a 
cloud later. We use Hive Metastore to store the latest metadata files and use 
NFS for table storage. We use spark (single node is sufficient for our needs) 
to ingest data and provide a python library for end users to query the data. 
The query needs are modest and the targeted query is mostly handled by the 
partitioning scheme and the resultant data that needs to be further queried can 
easily fit in one node. Currently we use pyspark in the user library and the 
startup time of JVM/spark and in general the slowness introduced by pyspark is 
something that we'd like to reduce. Hence looking at rust for a iceberg-lite 
kind of query API based on datafusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to