[
https://issues.apache.org/jira/browse/HUDI-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3674:
-----------------------------
Fix Version/s: 0.12.3
> Remove unnecessary HBase-related dependencies from bundles if there is any
> --------------------------------------------------------------------------
>
> Key: HUDI-3674
> URL: https://issues.apache.org/jira/browse/HUDI-3674
> Project: Apache Hudi
> Issue Type: Improvement
> Components: dependencies
> Reporter: Ethan Guo
> Priority: Blocker
> Fix For: 0.13.1, 0.12.3
>
>
> [https://github.com/apache/hudi/pull/5004/files] A follow-up of HUDI-1180.
> vinothchandar 6 days ago Member
> is the absolute minimal set of artifacts needed
>
> alexeykudinkin 6 days ago Contributor
> Need not to take as part of this PR, but i actually want to suggest one step
> further:
> Since we're mostly reliant on HFile and the classes it's dependent on, can we
> try to filter out packages that won't break it?
> My hunch is that we can greatly reduce 16Mb overhead number by just cleaning
> up all the stuff that is bolted onto HBase.
> 👍
> 1
>
> codope 4 days ago Member
> That's a good idea. In fact, i've tried out but it's a very manual
> time-consuming process to verify. I gave up after a few failures. And keep
> future upgrades in mind. But, i would be very happy to reduce the bundle size
> in any way we can and we should take another stab at this idea in future.
>
> yihua 4 days ago Author Member
> Yeah, that's good to have. The problem as @codope pointed out is that such a
> process is time-consuming. For now, what I can say is that the newly added
> artifacts are necessary, since I started with the old pom, incrementally
> added new artifacts as I saw NoClassDef exception until every test can pass.
> One thing we may try later is to add and trim hudi-hbase-shaded by excluding
> transitives and only depend on hudi-hbase-shaded here.
>
> alexeykudinkin 3 days ago Contributor
> Yeah, it's tedious manual process for sure, but i think we can do it pretty
> fast: we just look at the packages imported by HFile, then look at files that
> are imported by HFile, and so on. Then after that we can run the tests if we
> collected it properly or not.
> The hypothesis is that this set should be reasonably bounded (why wouldn't
> it?) so this iteration should be pretty fast.
> Can you please create a task and link it here to follow-up?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)