There were a number of discussions that happened during ApacheCon. In the
spirit of the Apache Way, I am taking the conversation online, sharing with
the larger community and also capturing requirements. Credits to Owen who
started this discussion.

There are a number of scenarios where users want to partially rewrite file
blocks, and it would make sense to create a file system API to make these
operations efficient.

1. Apache Iceberg or other evolvable table format.
These table formats need to update table schema. The underlying files are
rewritten but only a subset of blocks are changed. It would be much more
efficient if a new file can be composed using some of the existing file
blocks.

2. GDPR compliance "the right to erasure"
Files must be rewritten to remove a person's data at request. Again, this
is efficient because only a small set of file blocks is updated.

3. In-place erasure coding conversion.
I had a proposal to support atomically rewriting replicated files into
erasure coded files. This can be the building block to support auto-tiering.

Thoughts? What would be a good FS interface to support these requirements?

For Ozone folks, Ritesh opened a jira: HDDS-7297
<https://issues.apache.org/jira/browse/HDDS-7297> but I figured a larger
conversation should happen so that we can take into the consideration of
other FS implementations.

Thanks,
Weichiu

Reply via email to