kskalski commented on issue #4841: Issues wth Hadoop batch indexing using S3A in 0.10.1 and hadoop client 2.7.3 URL: https://github.com/apache/incubator-druid/issues/4841#issuecomment-415328785 At the moment my use-case is local indexing task ("index_hadoop" but without remote Hadoop cluster) with ioConfig type "hadoop" and input path pointing to "s3a://bucket/..." location, all running on AWS vms. The feature I wanted to use is role-based access to S3, which allows reading data from storage using credentials taken from environment (EC2 vm). This make s3a implementation call out to aws sdk and version incompatibility arises caused by very similar exception as mentioned here, i.e.: ``` Caused by: java.lang.NoSuchMethodError: com.amazonaws.AmazonWebServiceRequest.copyPrivateRequestParameters()Ljava/util/Map; at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3506) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) ~[?:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) ~[?:?] ``` I thought that using newer hadoop-aws.jar would help, but using the oldest newer than 2.7, that is 2.9 didn't help in itself. Instead it had other errors due to incompatibilities between different jars. In afterthought, maybe the culprit is not hadoop-aws.jar, but simply the fact that hadoop-dependencies/2.7 uses too old aws-java-sdk and converging it on the same version as placed in extensions/druid-hdfs-storage/ would be enough (though it will likely require several other jars to be brought in sync between those two dirs). I could try using ``mapreduce.job.classloader = true`` and/or just upgrading aws sdk.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
