vinothchandar commented on pull request #4695: URL: https://github.com/apache/hudi/pull/4695#issuecomment-1026923494
@yihua thanks for taking a stab at this. >since if we can unlock the HFile format for metadata table, Presto, Trino, with the first WIP PR, there is no real need to upgrade hbase version to 2.x Real issue with HFile usage in Hudi has been the bundling (shading and making the size smaller). HFile 2.x vs 1.x, its more about getting on a version that is not 5 years old :) . I don't think we saw any large perf improvements between 1.x and 2.x. I think even with the 1.x hbase we are on the ver 3 of HFile? (http://www.devdoc.net/bigdata/hbase-0.98.7-hadoop1/book/hfilev3.html , the HFile has its own version, like Hudi table version) @codope can chime in here as well. The urgency to do this stems from finalizing this before all the indexing work lands. >Should I just put the generated java classes inside the repo and get rid of the proto related files altogether? Need to take a closer look. if proto is used to define the storage format. may be we should keep it in? How big is that >Regarding the new dependencies pulled in, I can further trim the list down if some can cause conflict, e.g., commons-lang3, protobuf: right. the desired way for us is to trim the HFile to much much smaller amount of code even. We should not bring in any new dependencies that Hudi has gotten rid of - commons-lang, guava. Otherwise it defeats the purpose a little bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
