Ethan Guo created HUDI-3674:
-------------------------------

             Summary: Remove unnecessary HBase-related dependencies from 
bundles if there is any
                 Key: HUDI-3674
                 URL: https://issues.apache.org/jira/browse/HUDI-3674
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Ethan Guo


[https://github.com/apache/hudi/pull/5004/files] A follow-up of HUDI-1180. 

vinothchandar 6 days ago Member

is the absolute minimal set of artifacts needed

 
 alexeykudinkin 6 days ago Contributor

Need not to take as part of this PR, but i actually want to suggest one step 
further:
Since we're mostly reliant on HFile and the classes it's dependent on, can we 
try to filter out packages that won't break it?

My hunch is that we can greatly reduce 16Mb overhead number by just cleaning up 
all the stuff that is bolted onto HBase.

👍
1
 
 codope 4 days ago Member

That's a good idea. In fact, i've tried out but it's a very manual 
time-consuming process to verify. I gave up after a few failures. And keep 
future upgrades in mind. But, i would be very happy to reduce the bundle size 
in any way we can and we should take another stab at this idea in future.

 
 yihua 4 days ago Author Member

Yeah, that's good to have. The problem as @codope pointed out is that such a 
process is time-consuming. For now, what I can say is that the newly added 
artifacts are necessary, since I started with the old pom, incrementally added 
new artifacts as I saw NoClassDef exception until every test can pass.

One thing we may try later is to add and trim hudi-hbase-shaded by excluding 
transitives and only depend on hudi-hbase-shaded here.

 
 alexeykudinkin 3 days ago Contributor

Yeah, it's tedious manual process for sure, but i think we can do it pretty 
fast: we just look at the packages imported by HFile, then look at files that 
are imported by HFile, and so on. Then after that we can run the tests if we 
collected it properly or not.

The hypothesis is that this set should be reasonably bounded (why wouldn't it?) 
so this iteration should be pretty fast.

Can you please create a task and link it here to follow-up?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to