yihua commented on pull request #4695:
URL: https://github.com/apache/hudi/pull/4695#issuecomment-1026487129
cc @vinothchandar
My approach is pulling the HFile format relevant classes from HBase repo
with rel 2.4.9, into hudi repo `hudi-io` module with renamed package of
`org.apache.hudi.hbase` instead of `org.apache.hadoop.hbase`. I trimmed some
classes to limit the number of deps pulled in. All the backward compatibility
logic of KeyValue.KVComparator (hbase1) vs CellComparator (hbase2) is pulled in
as well so we can control that. In such a way, any hudi logic using HFile
format is going to use internal `org.apache.hudi.hbase` classes, while
SparkHoodieHBaseIndex still uses hbase lib with `org.apache.hadoop.hbase`
classes (these two are independent).
A few things to finalize:
- I'm questioning whether we should flip the hbase version in hudi repo,
since if we can unlock the HFile format for metadata table, Presto, Trino, with
the first WIP PR, there is no real need to upgrade hbase version to 2.x, which
could introduce compatibility issues for SparkHoodieHBaseIndex. Anything I
miss here? wdyt?
- Right now, protobuf is used to generate proto classes and I pulled in the
.proto and protobuf libs (hudi-io-proto module). Should I just put the
generated java classes inside the repo and get rid of the proto related files
altogether? I can keep hudi-io-proto module though and make hudi-io include
generated code, not depending on hudi-io-proto, so in the future we can still
evolve the protos.
- Regarding the new dependencies pulled in, I can further trim the list down
if some can cause conflict, e.g., `commons-lang3`, `protobuf`:
```
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<scope>provided</scope>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<scope>provided</scope>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-protobuf</artifactId>
<version>4.0.1</version>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-miscellaneous</artifactId>
<version>4.0.1</version>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-gson</artifactId>
<version>4.0.1</version>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-netty</artifactId>
<version>4.0.1</version>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-core4</artifactId>
<version>4.2.0-incubating</version>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
<scope>compile</scope>
<groupId>org.apache.yetus</groupId>
<artifactId>audience-annotations</artifactId>
<version>0.13.0</version>
<groupId>com.esotericsoftware</groupId>
<artifactId>kryo-shaded</artifactId>
<version>4.0.2</version>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]