>>I think HFile upgrade in particular is more complicated than you think. >>We currently have production traffic running with HFileV1. It has a >>5-min >>SLA. We can't afford to take the entire downtime to rewrite 100GB (or >>whatever) worth of data. We need to do this while the cluster is live. > >AFAIK that's how it's done, V1 files are being rewritten to V2 when a >compaction happens. You don't have to do some offline processing >before getting the cluster back online.
Correct. Sorry for the confusion. I just meant that we shouldn't get rid of the HFileV1 Reader because it allows online upgrades to happen as compactions happen. Trying to remove this functionality or refactor this to some transformation utility would have negative consequences. LSMT is designed with immutability as a core principle, so I think we should incline towards creating online mutation of persistent data as opposed to migration utilities.