Hi Paimon community, I would like to start a discussion about supporting
Deletion Vectors (DVs) for DataEvolution tables. In many AI scenarios,
users frequently need to delete data (e.g., removing low-quality samples,
deduplication, or excluding biased training data). The existing file-level
DV in AppendTable is incompatible with DataEvolution, and rewriting files
for random deletions causes small-file explosion and index invalidation. I
prepared a short design document proposing a range-based DV approach, along
with solutions for Merge Into, Compaction, and Vector Index adaptation:
https://docs.google.com/document/d/14XHZCgtz_487eKq8k0s_hVfaVA9ETZw4rle19-qN7hY/edit?usp=sharing
Could you please take a look and share your thoughts? Best, wang

Reply via email to