Newbie question: I want to add system/integration tests for the new functionality. There are a set of existing tests around Spark Catalog that I can leverage. Great. The provider I’m writing is backed by a web service though which is part of an AWS account. I can write the tests using a mocked client that somehow clones the behavior of the webservice, but I’ll get the most value if I actually run the tests against a real AWS Glue account.
How do you guys deal with external dependencies for system tests? Is there an AWS account that is used for this purpose by any chance? Thanks, -Ameen From: Steve Loughran <ste...@hortonworks.com> Date: Tuesday, February 13, 2018 at 5:01 PM To: "Tayyebi, Ameen" <tayye...@amazon.com> Cc: Apache Spark Dev <dev@spark.apache.org> Subject: Re: A new external catalog On 13 Feb 2018, at 21:20, Tayyebi, Ameen <tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote: Yes, I’m thinking about upgrading to these: <aws.kinesis.client.version>1.9.0</aws.kinesis.client.version> <!-- Should be consistent with Kinesis client dependency --> <aws.java.sdk.version>1.11.272</aws.java.sdk.version> From: <aws.kinesis.client.version>1.7.3</aws.kinesis.client.version> <!-- Should be consistent with Kinesis client dependency --> <aws.java.sdk.version>1.11.76</aws.java.sdk.version> 272 is the earliest that has Glue. How about I let the build system run the tests and if things start breaking I fall back to shading Glue’s specific SDK? FWIW, some of the other troublespots are not functional, they're log overflow https://issues.apache.org/jira/browse/HADOOP-15040 https://issues.apache.org/jira/browse/HADOOP-14596 Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go with that into Hadoop 3.1 if we're happy, but that's not so much for new features but "stack traces throughout the log", which seems to be a recurrent issue with the JARs, and one which often slips by CI build runs. If it wasn't for that, we'd have stuck with 1.11.199 because it didn't have any issues that we hadn't already got under control (https://github.com/aws/aws-sdk-java/issues/1211) Like I said: upgrades bring fear From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> Date: Tuesday, February 13, 2018 at 3:34 PM To: "Tayyebi, Ameen" <tayye...@amazon.com<mailto:tayye...@amazon.com>> Cc: Apache Spark Dev <dev@spark.apache.org<mailto:dev@spark.apache.org>> Subject: Re: A new external catalog On 13 Feb 2018, at 19:50, Tayyebi, Ameen <tayye...@amazon.com<mailto:tayye...@amazon.com>> wrote: The biggest challenge is that I had to upgrade the AWS SDK to a newer version so that it includes the Glue client since Glue is a new service. So far, I haven’t see any jar hell issues, but that’s the main drawback I can see. I’ve made sure the version is in sync with the Kinesis client used by spark-streaming module. Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest version up front saying "Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see." https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, especially if it's the unshaded version which forces in a version of jackson. Which SDK version are you proposing? 1.11.x ?