Joe McDonnell created IMPALA-11125:
--------------------------------------

             Summary: Revisit the minimal-s3a-aws-sdk jar
                 Key: IMPALA-11125
                 URL: https://issues.apache.org/jira/browse/IMPALA-11125
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
    Affects Versions: Impala 4.1.0
            Reporter: Joe McDonnell


The impala-minimal-s3a-aws-sdk jar takes the com.amazonaws aws-java-sdk-bundle 
and filters out a bunch of unneeded items. With these changes, the jar goes 
from 183MB to 89MB.

When unpacking it, it looks like we still have some content that can be 
removed. There are some services that we don't use (which may not have been 
there when we first did this):
{noformat}
$ ls com/amazonaws/services | wc -l
116
$ ls com/amazonaws/services
accessanalyzer
acmpca
apigatewaymanagementapi
appconfig
appflow
applicationinsights
appregistry
augmentedairuntime
...{noformat}
Separately, the models directory takes up a lot of space:
{noformat}
$ du -ch models
807M    models
807M    total
$ ls models | wc -l
468
$ ls models
a4b-2017-11-09-intermediate.json
a4b-2017-11-09-model.json
...{noformat}
These are json files that compress well, but nonetheless, they take up space.

We should either revisit our exclusions and try to avoid packaging some of 
these models, or we should try to avoid using aws-java-sdk-bundle and instead 
pick out individual jars like aws-java-sdk-s3 and aws-java-sdk-dynamodb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to