Davis-Zhang-Onehouse commented on PR #13414: URL: https://github.com/apache/hudi/pull/13414#issuecomment-2981141127
@yihua @vinothchandar new items to factor in: backwards and forward compatibility. I spotted major issues and the PR is blocked on your feedback # Compatibility ## Forward compatibility If SI using version 2 (hash partition on data column value only) and hudi is of old binary, what happens is hudi does not has the concept of index version and will treat the new SI version as if it is the old one. As a result, ### Read path it should be fine since it will use prefix lookup which naturally compatible with the new partition strategy. ### Write path Write path is messed up as the old hudi binary will write to new index version with old partition strategy. What make things worse is the hudi index version is not updated as the old binary do not have such logic. So we end up with a corrupted version 2 hoodie index as the old hudi binary do not conform to the version 2 protocol of updating the index. ## backward compatibility this should be fine as the new hudi binary will properly recognize the version (or the absence of the version) and adapt properly. # Fundamental limitation of the index version design Old hudi binary only recognize and respect table version. Introducing index version means user must use a version that recognize and honor this. In industry the standard procedure is - introducing a "compatibility patch" which recognize the version and proper back off it is some future version (old hudi binary will choose not to use the index even there is one) - User must be aware of all readers/writers that happens to a hudi table. If the hudi table is of SI version 2, user must make sure all hudi versions are at least >= the compatibility patch. - This place a burden on the user side, and failed to do so means SI is silently corrupted and causing correctness issue which is not acceptable. We need a place to guide user and this is way cumbersome than a table version upgrade. - If all readers writers are managed by some service provider, this might not be a issue - just introduce the compatibiilty patch and we are all good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
