Thanks a lot Steve. I’ll go through the Jira’s you linked in detail. I took a 
quick look and am sufficiently scared for now. I had run into that warning from 
the S3 stream before. Sigh.

From: Steve Loughran <ste...@hortonworks.com>
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" <tayye...@amazon.com>
Cc: Apache Spark Dev <dev@spark.apache.org>
Subject: Re: A new external catalog




On 13 Feb 2018, at 21:20, Tayyebi, Ameen 
<tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote:

Yes, I’m thinking about upgrading to these:
<aws.kinesis.client.version>1.9.0</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.272</aws.java.sdk.version>

From:

<aws.kinesis.client.version>1.7.3</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.76</aws.java.sdk.version>

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I 
fall back to shading Glue’s specific SDK?


FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go 
with that into Hadoop 3.1 if we're happy, but that's not so much for new 
features but "stack traces throughout the log", which seems to be a recurrent 
issue with the JARs, and one which often slips by CI build runs. If it wasn't 
for that, we'd have stuck with 1.11.199 because it didn't have any issues that 
we hadn't already got under control 
(https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear


From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>>
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" <tayye...@amazon.com<mailto:tayye...@amazon.com>>
Cc: Apache Spark Dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: Re: A new external catalog





On 13 Feb 2018, at 19:50, Tayyebi, Ameen 
<tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote:


The biggest challenge is that I had to upgrade the AWS SDK to a newer version 
so that it includes the Glue client since Glue is a new service. So far, I 
haven’t see any jar hell issues, but that’s the main drawback I can see. I’ve 
made sure the version is in sync with the Kinesis client used by 
spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest 
version up front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, 
only change the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, 
especially if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?


Reply via email to