cnauroth commented on PR #8243: URL: https://github.com/apache/hadoop/pull/8243#issuecomment-3880796276
> what about _not shading these_? > > BTW, what's the size of the distribution tar on 3.5.0 as gcs and cos both bundle a lot...even after stripping out aws bundle I worry we're at risk of crossing that 1 GB threshold I'm getting ~500 MB for the distribution tarball. See below for some breakdown of the heaviest hitters. hadoop-gcp is one of the top ones. Within hadoop-gcp, the biggest contributors are Protobuf, Guava, gRPC and Netty native binaries. From history working on gs:// in the previous Google-owned repo, we've found that we needed full shading to guarantee compatibility with user workloads. The biggest challenge is alignment on the Protobuf and Guava version, as the underlying GCS SDK moves those along at its own pace. ``` # share/hadoop directories, size descending > du -b -s share/hadoop/* | sort -nr | head 187592533 share/hadoop/yarn 156284851 share/hadoop/common 105883478 share/hadoop/client 83782079 share/hadoop/hdfs 29008259 share/hadoop/mapreduce 22844157 share/hadoop/tools ``` ``` # Biggest jar sizes, descending > for x in $(find . -name '*.jar'); do du -b $x; done | sort -nr | head 55992250 ./share/hadoop/client/hadoop-client-minicluster-3.5.0-SNAPSHOT.jar 35393191 ./share/hadoop/common/lib/hadoop-gcp-3.5.0-SNAPSHOT.jar 34558357 ./share/hadoop/yarn/timelineservice/lib/hbase-shaded-client-byo-hadoop-2.6.3-hadoop3.jar 30080918 ./share/hadoop/client/hadoop-client-runtime-3.5.0-SNAPSHOT.jar 19810310 ./share/hadoop/client/hadoop-client-api-3.5.0-SNAPSHOT.jar 9204801 ./share/hadoop/tools/lib/kafka-clients-3.9.0.jar 8451859 ./share/hadoop/common/lib/bcprov-jdk18on-1.82.jar 6687870 ./share/hadoop/tools/lib/zstd-jni-1.5.6-4.jar 6557826 ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT-tests.jar 6459241 ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT.jar ``` ``` # Within hadoop-gcp, Protobuf and Guava are the biggest contributors > du -b -s com/google/cloud/hadoop/repackaged/ossgcs/com/google/* | sort -nr 5792042 com/google/cloud/hadoop/repackaged/ossgcs/com/google/protobuf 5082272 com/google/cloud/hadoop/repackaged/ossgcs/com/google/common 4900043 com/google/cloud/hadoop/repackaged/ossgcs/com/google/cloud 4559447 com/google/cloud/hadoop/repackaged/ossgcs/com/google/api 4302946 com/google/cloud/hadoop/repackaged/ossgcs/com/google/storage 1042388 com/google/cloud/hadoop/repackaged/ossgcs/com/google/monitoring 859338 com/google/cloud/hadoop/repackaged/ossgcs/com/google/auth 727606 com/google/cloud/hadoop/repackaged/ossgcs/com/google/rpc 589946 com/google/cloud/hadoop/repackaged/ossgcs/com/google/gson 499635 com/google/cloud/hadoop/repackaged/ossgcs/com/google/longrunning 451096 com/google/cloud/hadoop/repackaged/ossgcs/com/google/iam 285057 com/google/cloud/hadoop/repackaged/ossgcs/com/google/re2j 85889 com/google/cloud/hadoop/repackaged/ossgcs/com/google/type ``` ``` # Also gRPC > du -b -s com/google/cloud/hadoop/repackaged/ossgcs/io/* | sort -nr 63209125 com/google/cloud/hadoop/repackaged/ossgcs/io/grpc 1857154 com/google/cloud/hadoop/repackaged/ossgcs/io/opentelemetry 948329 com/google/cloud/hadoop/repackaged/ossgcs/io/opencensus 16788 com/google/cloud/hadoop/repackaged/ossgcs/io/perfmark ``` ``` # Also the Netty native binaries > du -b -s META-INF/native/* | sort -nr 2867712 META-INF/native/com_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_windows_x86_64.dll 2684992 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_x86_64.jnilib 2684104 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_x86_64.so 2437120 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_aarch_64.jnilib 2420512 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_aarch_64.so 108608 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_aarch_64.so 99422 META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_x86_64.so ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
