CalvinKirs opened a new pull request, #57202:
URL: https://github.com/apache/doris/pull/57202

   
   
   ### Background
   
   Previously, BE depended on the HDFS official distribution through the 
thirdparty module. This approach directly compiled and packaged Hadoop’s full 
dependencies, including large and unnecessary modules such as YARN, MapReduce, 
etc. As a result:
   
   The build artifact contained many duplicate jars.
   
   The total size was unnecessarily large (hundreds of MB).
   
   BE only acted as an HDFS client, not a Hadoop runtime, so most of these 
dependencies were not needed.
   
   ### Changes in this PR
   
   Move all HDFS-related jars to FE for unified dependency management, since FE 
already uses HDFS extensively.
   
   Retain necessary native/C++ dependencies under thirdparty for BE.
   
   Do not build an uber-jar for HDFS (unlike other BE thirdparty libs).
   
   Exclude unused Hadoop modules (e.g., YARN, MR, tools).
   
   The resulting target structure now includes:
   ```
   target/
   ├── classes
   ├── lib/
   │   └── (~57M total)
   ├── hadoop-deps.jar
   └── ...
   ```
   
   Compared with the previous build, the total artifact size has been reduced 
by ~150 MB.
   
   ### Rationale
   
   Since BE only functions as a client to access HDFS, managing Java-side HDFS 
dependencies in FE is cleaner and more maintainable. If BE needs custom logic 
(e.g., optimized HDFS client), we can extend or override the relevant classes 
and control class loading order.
   
   Impact
   
   Build size reduced by ~150 MB
   
   Simpler dependency graph
   
   ### Notes
   
   Native/C++ HDFS dependencies remain managed under thirdparty.
   
   FE now provides necessary jars for BE at runtime.
   
   No need to modify existing C++ build logic.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to