cnauroth commented on PR #8243:
URL: https://github.com/apache/hadoop/pull/8243#issuecomment-3880796276

   > what about _not shading these_?
   > 
   > BTW, what's the size of the distribution tar on 3.5.0 as gcs and cos both 
bundle a lot...even after stripping out aws bundle I worry we're at risk of 
crossing that 1 GB threshold
   
   I'm getting ~500 MB for the distribution tarball. See below for some 
breakdown of the heaviest hitters. hadoop-gcp is one of the top ones. Within 
hadoop-gcp, the biggest contributors are Protobuf, Guava, gRPC and Netty native 
binaries.
   
   From history working on gs:// in the previous Google-owned repo, we've found 
that we needed full shading to guarantee compatibility with user workloads. The 
biggest challenge is alignment on the Protobuf and Guava version, as the 
underlying GCS SDK moves those along at its own pace.
   
   ```
   # share/hadoop directories, size descending
   > du -b -s share/hadoop/* | sort -nr | head
   187592533    share/hadoop/yarn
   156284851    share/hadoop/common
   105883478    share/hadoop/client
   83782079     share/hadoop/hdfs
   29008259     share/hadoop/mapreduce
   22844157     share/hadoop/tools
   ```
   
   ```
   # Biggest jar sizes, descending
   > for x in $(find . -name '*.jar'); do du -b $x; done | sort -nr | head
   55992250     
./share/hadoop/client/hadoop-client-minicluster-3.5.0-SNAPSHOT.jar
   35393191     ./share/hadoop/common/lib/hadoop-gcp-3.5.0-SNAPSHOT.jar
   34558357     
./share/hadoop/yarn/timelineservice/lib/hbase-shaded-client-byo-hadoop-2.6.3-hadoop3.jar
   30080918     ./share/hadoop/client/hadoop-client-runtime-3.5.0-SNAPSHOT.jar
   19810310     ./share/hadoop/client/hadoop-client-api-3.5.0-SNAPSHOT.jar
   9204801      ./share/hadoop/tools/lib/kafka-clients-3.9.0.jar
   8451859      ./share/hadoop/common/lib/bcprov-jdk18on-1.82.jar
   6687870      ./share/hadoop/tools/lib/zstd-jni-1.5.6-4.jar
   6557826      ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT-tests.jar
   6459241      ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT.jar
   ```
   
   ```
   # Within hadoop-gcp, Protobuf and Guava are the biggest contributors
   > du -b -s com/google/cloud/hadoop/repackaged/ossgcs/com/google/* | sort -nr
   5792042      com/google/cloud/hadoop/repackaged/ossgcs/com/google/protobuf
   5082272      com/google/cloud/hadoop/repackaged/ossgcs/com/google/common
   4900043      com/google/cloud/hadoop/repackaged/ossgcs/com/google/cloud
   4559447      com/google/cloud/hadoop/repackaged/ossgcs/com/google/api
   4302946      com/google/cloud/hadoop/repackaged/ossgcs/com/google/storage
   1042388      com/google/cloud/hadoop/repackaged/ossgcs/com/google/monitoring
   859338       com/google/cloud/hadoop/repackaged/ossgcs/com/google/auth
   727606       com/google/cloud/hadoop/repackaged/ossgcs/com/google/rpc
   589946       com/google/cloud/hadoop/repackaged/ossgcs/com/google/gson
   499635       com/google/cloud/hadoop/repackaged/ossgcs/com/google/longrunning
   451096       com/google/cloud/hadoop/repackaged/ossgcs/com/google/iam
   285057       com/google/cloud/hadoop/repackaged/ossgcs/com/google/re2j
   85889        com/google/cloud/hadoop/repackaged/ossgcs/com/google/type
   ```
   
   ```
   # Also gRPC
   > du -b -s com/google/cloud/hadoop/repackaged/ossgcs/io/* | sort -nr
   63209125     com/google/cloud/hadoop/repackaged/ossgcs/io/grpc
   1857154      com/google/cloud/hadoop/repackaged/ossgcs/io/opentelemetry
   948329       com/google/cloud/hadoop/repackaged/ossgcs/io/opencensus
   16788        com/google/cloud/hadoop/repackaged/ossgcs/io/perfmark
   ```
   
   ```
   # Also the Netty native binaries
   > du -b -s META-INF/native/* | sort -nr
   2867712      
META-INF/native/com_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_windows_x86_64.dll
   2684992      
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_x86_64.jnilib
   2684104      
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_x86_64.so
   2437120      
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_aarch_64.jnilib
   2420512      
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_aarch_64.so
   108608       
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_aarch_64.so
   99422        
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_x86_64.so
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to