[GitHub] [incubator-druid] kskalski commented on issue #4841: Issues wth Hadoop batch indexing using S3A in 0.10.1 and hadoop client 2.7.3

GitHub Thu, 23 Aug 2018 01:05:21 -0700

At the moment my use-case is local indexing task ("index_hadoop" but without 
remote Hadoop cluster) with ioConfig type "hadoop" and input path pointing to 
"s3a://bucket/..." location, all running on AWS vms. The feature I wanted to 
use is role-based access to S3, which allows reading data from storage using 
credentials taken from environment (EC2 vm). This make s3a implementation call 
out to aws sdk and version incompatibility arises caused by very similar 
exception as mentioned here, i.e.:
```
Caused by: java.lang.NoSuchMethodError: 
com.amazonaws.AmazonWebServiceRequest.copyPrivateRequestParameters()Ljava/util/Map;
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3506) 
~[?:?]
at 
com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) 
~[?:?]
at 
com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
 ~[?:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) 
~[?:?]
```


I thought that using newer hadoop-aws.jar would help, but using the oldest 
newer than 2.7, that is 2.9 didn't help in itself. Instead it had other errors 
due to incompatibilities between different jars. In afterthought, maybe the 
culprit is not hadoop-aws.jar, but simply the fact that hadoop-dependencies/2.7 
uses too old aws-java-sdk and converging it on the same version as placed in 
extensions/druid-hdfs-storage/ would be enough (though it will likely require 
several other jars to be brought in sync between those two dirs).

I could try using ``mapreduce.job.classloader = true`` and/or just upgrading 
aws sdk.

[ Full content available at: 
https://github.com/apache/incubator-druid/issues/4841 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [incubator-druid] kskalski commented on issue #4841: Issues wth Hadoop batch indexing using S3A in 0.10.1 and hadoop client 2.7.3

Reply via email to