Ethan Guo created HUDI-3674:
-------------------------------
Summary: Remove unnecessary HBase-related dependencies from
bundles if there is any
Key: HUDI-3674
URL: https://issues.apache.org/jira/browse/HUDI-3674
Project: Apache Hudi
Issue Type: Improvement
Reporter: Ethan Guo
[https://github.com/apache/hudi/pull/5004/files] A follow-up of HUDI-1180.
vinothchandar 6 days ago Member
is the absolute minimal set of artifacts needed
alexeykudinkin 6 days ago Contributor
Need not to take as part of this PR, but i actually want to suggest one step
further:
Since we're mostly reliant on HFile and the classes it's dependent on, can we
try to filter out packages that won't break it?
My hunch is that we can greatly reduce 16Mb overhead number by just cleaning up
all the stuff that is bolted onto HBase.
👍
1
codope 4 days ago Member
That's a good idea. In fact, i've tried out but it's a very manual
time-consuming process to verify. I gave up after a few failures. And keep
future upgrades in mind. But, i would be very happy to reduce the bundle size
in any way we can and we should take another stab at this idea in future.
yihua 4 days ago Author Member
Yeah, that's good to have. The problem as @codope pointed out is that such a
process is time-consuming. For now, what I can say is that the newly added
artifacts are necessary, since I started with the old pom, incrementally added
new artifacts as I saw NoClassDef exception until every test can pass.
One thing we may try later is to add and trim hudi-hbase-shaded by excluding
transitives and only depend on hudi-hbase-shaded here.
alexeykudinkin 3 days ago Contributor
Yeah, it's tedious manual process for sure, but i think we can do it pretty
fast: we just look at the packages imported by HFile, then look at files that
are imported by HFile, and so on. Then after that we can run the tests if we
collected it properly or not.
The hypothesis is that this set should be reasonably bounded (why wouldn't it?)
so this iteration should be pretty fast.
Can you please create a task and link it here to follow-up?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)