Hi Iceberg Dev,
We are looking into Iceberg for a data lake solution to replace a legacy system been there for many years. Our data(~10+PB in total) is time-series tabular data. We built a proof-of-concept earlier, which ended up with a very similar design like Iceberg, especially on the table spec. However, our use case has a few special requirements (supported by our legacy system) that are missing in Iceberg today: - Our applications always expect sorted rows (by timestamp) when reading the time-series data from the data lake. - Our users do not want to deal with table partitioning. They expect the storage layer (or the data-lake middle layer) to optimize the partition for them. Our legacy system supports both by enforcing row order at write and having a background service that consolidates small data files into larger ones to optimize storage usage for better query performance. (The system does merge-on-read that resolves the intersected time ranges which have not been consolidated yet.) After we switch to Iceberg, to continue supporting the above features, it looks like we have to: 1. use a special partition spec that always creates a single partition for any table, 2. build a background consolidation service on top of Iceberg's compaction API 3. build a new writer (we use Arrow) that enforce write order. Would that be too much customization on top of what Iceberg has today? Or do you even consider this as a legitimate use case for Iceberg in the future? We noticed many ongoing efforts around topics like SortOrder, Merge-on-Read, Row-delete, etc. that seem to be very relevant. We are happy to contribute to the community if our use case makes sense to Iceberg. Thanks, Yi