[jira] [Commented] (HADOOP-16080) hadoop-aws does not work with hadoop-client-api

Steve Loughran (JIRA) Tue, 29 Jan 2019 10:24:15 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755264#comment-16755264
 ]


Steve Loughran commented on HADOOP-16080:
-----------------------------------------

bq. No, I am quite busy 

That is the problem we all have, I'm afraid.

bq.  The envisioned hadoop-cloudstorage artifact seems misaligned with the 
communities and dependencies. 

Why so? 

Spark has a declared dependency on the unshaded hadoop-cloud-storage JAR: 
https://github.com/apache/spark/blob/master/hadoop-cloud/pom.xml#L208; so does 
Tez, and some other projects. Having a shaded offering would only need a change 
in those declarations and cover all the stores.

bq. Seems a better structure would be that hadoop-aws is an independent 
artifact that only uses public/stable hadoop APIs. I took a look at 
SemaphoredDelegatingExecutor and noticed that is marked 
InterfaceAudience.Private, so it seems like hadoop-aws should just not use it

SemaphoredDelegatingExecutor actually arrived in hadoop-aws first, 
HADOOP-13560; pulled up into hadoop-common by HADOOP-15309 so that it could be 
shared by the other object stores. It's private *within Hadoop itself*. By 
tagging as such, we retain the option of making incompatible changes. 
Similarly, we keep a lot of implementation stuff in hadoop-common, and share 
test suites of FS behaviours in hadoop-common-tests. That keeps maintenance 
costs down (do I really have to have a copy and paste of 
SemaphoredDelegatingExecutor? What about EtagChecksum? or all the new fs.impl 
stuff I'm adding in HADOOP-15229 for async IO?

bq.  If I magically had the time I would explore making hadoop-aws more 
independent instead of more dependent.

The other aspect of a shaded cloud moduleis that it would also be able to hide 
transitive dependencies. 
You've avoided seeing that problem because you already had SLF4J, commons-*, 
etc on the CP, of compatible versions, and as we've switched to the shaded AWS 
SDK, so you don't have to worry about the jackson and httpclient problems which 
are complex enough that we are going to have to stop making Hadoop 2.7.x 
releases. But hadoop-azure does pass on its unshaded dependencies, as do some 
others -and I do get to deal with those problems. If we can produce a single 
JAR "depend on this and you won't have classpath problems", people will be 
happy. It that which tends to be the most traumatic.

> hadoop-aws does not work with hadoop-client-api
> -----------------------------------------------
>
>                 Key: HADOOP-16080
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16080
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.1.1
>            Reporter: Keith Turner
>            Priority: Major
>
> I attempted to use Accumulo and S3a with the following jars on the classpath.
>  * hadoop-client-api-3.1.1.jar
>  * hadoop-client-runtime-3.1.1.jar
>  * hadoop-aws-3.1.1.jar
> This failed with the following exception.
> {noformat}
> Exception in thread "init" java.lang.NoSuchMethodError: 
> org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:769)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1108)
>     at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1413)
>     at 
> org.apache.accumulo.server.fs.VolumeManagerImpl.createNewFile(VolumeManagerImpl.java:184)
>     at 
> org.apache.accumulo.server.init.Initialize.initDirs(Initialize.java:479)
>     at 
> org.apache.accumulo.server.init.Initialize.initFileSystem(Initialize.java:487)
>     at 
> org.apache.accumulo.server.init.Initialize.initialize(Initialize.java:370)
>     at org.apache.accumulo.server.init.Initialize.doInit(Initialize.java:348)
>     at org.apache.accumulo.server.init.Initialize.execute(Initialize.java:967)
>     at org.apache.accumulo.start.Main.lambda$execKeyword$0(Main.java:129)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The problem is that {{S3AFileSystem.create()}} looks for 
> {{SemaphoredDelegatingExecutor(com.google.common.util.concurrent.ListeningExecutorService)}}
>  which does not exist in hadoop-client-api-3.1.1.jar.  What does exist is 
> {{SemaphoredDelegatingExecutor(org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService)}}.
> To work around this issue I created a version of hadoop-aws-3.1.1.jar that 
> relocated references to Guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-16080) hadoop-aws does not work with hadoop-client-api

Reply via email to