Joe McDonnell created IMPALA-11125:
--------------------------------------
Summary: Revisit the minimal-s3a-aws-sdk jar
Key: IMPALA-11125
URL: https://issues.apache.org/jira/browse/IMPALA-11125
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure
Affects Versions: Impala 4.1.0
Reporter: Joe McDonnell
The impala-minimal-s3a-aws-sdk jar takes the com.amazonaws aws-java-sdk-bundle
and filters out a bunch of unneeded items. With these changes, the jar goes
from 183MB to 89MB.
When unpacking it, it looks like we still have some content that can be
removed. There are some services that we don't use (which may not have been
there when we first did this):
{noformat}
$ ls com/amazonaws/services | wc -l
116
$ ls com/amazonaws/services
accessanalyzer
acmpca
apigatewaymanagementapi
appconfig
appflow
applicationinsights
appregistry
augmentedairuntime
...{noformat}
Separately, the models directory takes up a lot of space:
{noformat}
$ du -ch models
807M models
807M total
$ ls models | wc -l
468
$ ls models
a4b-2017-11-09-intermediate.json
a4b-2017-11-09-model.json
...{noformat}
These are json files that compress well, but nonetheless, they take up space.
We should either revisit our exclusions and try to avoid packaging some of
these models, or we should try to avoid using aws-java-sdk-bundle and instead
pick out individual jars like aws-java-sdk-s3 and aws-java-sdk-dynamodb.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)