Re: A new external catalog

Tayyebi, Ameen Wed, 14 Feb 2018 11:57:05 -0800

Newbie question:

I want to add system/integration tests for the new functionality. There are a 
set of existing tests around Spark Catalog that I can leverage. Great. The 
provider I’m writing is backed by a web service though which is part of an AWS 
account. I can write the tests using a mocked client that somehow clones the 
behavior of the webservice, but I’ll get the most value if I actually run the 
tests against a real AWS Glue account.

How do you guys deal with external dependencies for system tests? Is there an 
AWS account that is used for this purpose by any chance?

Thanks,
-Ameen

From: Steve Loughran <ste...@hortonworks.com>
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" <tayye...@amazon.com>
Cc: Apache Spark Dev <dev@spark.apache.org>
Subject: Re: A new external catalog

On 13 Feb 2018, at 21:20, Tayyebi, Ameen 
<tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote:

Yes, I’m thinking about upgrading to these:
<aws.kinesis.client.version>1.9.0</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.272</aws.java.sdk.version>

From:

<aws.kinesis.client.version>1.7.3</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.76</aws.java.sdk.version>

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I 
fall back to shading Glue’s specific SDK?

FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go 
with that into Hadoop 3.1 if we're happy, but that's not so much for new 
features but "stack traces throughout the log", which seems to be a recurrent 
issue with the JARs, and one which often slips by CI build runs. If it wasn't 
for that, we'd have stuck with 1.11.199 because it didn't have any issues that 
we hadn't already got under control 
(https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear

From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>>
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" <tayye...@amazon.com<mailto:tayye...@amazon.com>>
Cc: Apache Spark Dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: Re: A new external catalog

On 13 Feb 2018, at 19:50, Tayyebi, Ameen 
<tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote:

The biggest challenge is that I had to upgrade the AWS SDK to a newer version 
so that it includes the Glue client since Glue is a new service. So far, I 
haven’t see any jar hell issues, but that’s the main drawback I can see. I’ve 
made sure the version is in sync with the Kinesis client used by 
spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest 
version up front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, 
only change the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, 
especially if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?

Re: A new external catalog

Reply via email to