Shawn Smith created AVRO-1144:
---------------------------------

             Summary: Deadlock with FSInput and Hadoop NativeS3FileSystem.
                 Key: AVRO-1144
                 URL: https://issues.apache.org/jira/browse/AVRO-1144
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.0
         Environment: Hadoop 1.0.3
            Reporter: Shawn Smith


Deadlock can occur when using org.apache.avro.mapred.FsInput to read files from 
S3 using the Hadoop NativeS3FileSystem and multiple threads.

There are a lot of components involved, but the basic cause is pretty simple: 
Apache Commons HttpClient can deadlock waiting for a free HTTP connection when 
the number of threads downloading from S3 is greater than or equal to the 
maximum allowed HTTP connections per host.

I've filed this bug against Avro because the bug is easiest to fix in Avro.  
Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls in 
the FSInput constructor:
{noformat}
/** Construct given a path and a configuration. */
public FsInput(Path path, Configuration conf) throws IOException {
  this.stream = path.getFileSystem(conf).open(path);
  this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
}
{noformat}
to
{noformat}
/** Construct given a path and a configuration. */
public FsInput(Path path, Configuration conf) throws IOException {
  this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
  this.stream = path.getFileSystem(conf).open(path);
}
{noformat}

Here's what triggers the deadlock:
* FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and open 
an HTTP connection for downloading content.  This acquires an HTTP connection 
but does not release it.
* FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to S3 
and perform a HEAD request to get object metadata.  This attempts to acquire a 
second HTTP connection.
* Jets3t uses Apache Commons HTTP Client which limits the number of 
simultaneous HTTP connections to a given host.  Lets say this maximum is 4 (the 
default)...  If 4 threads all call the FSInput constructor concurrently, the 4 
FileSystem.open() calls can acquire all 4 available connections and the 
FileSystem.getFileStatus() calls block forever waiting for a thread to release 
an HTTP connection back to the connection pool.

A simple way to reproduce the problem this problem is to create 
"jets3t.properties" in your classpath with "httpclient.max-connections=1".  
Then try to open a file using FSInput and the Native S3 file system (new 
Path("s3n://<bucket>/<path>")).  It will hang indefinitely inside the FSInput 
constructor.

Swapping the order of the open() and getFileStatus() calls ensures that a given 
thread using FSInput has at most one outstanding connection S3 at a time.  As a 
result, one thread should always be able to make progress, avoiding deadlock.

Here's a sample stack trace of a deadlocked thread:
{noformat}
"pool-10-thread-3" prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() 
[116a02000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <785892cc0> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
        - locked <785892cc0> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
        at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at 
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357)
        at 
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652)
        at 
org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556)
        at 
org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492)
        at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793)
        at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225)
        at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.fs.s3native.$Proxy25.retrieveMetadata(Unknown 
Source)
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:326)
        at org.apache.avro.mapred.FsInput.<init>(FsInput.java:38)
        at 
org.apache.crunch.io.avro.AvroFileReaderFactory.read(AvroFileReaderFactory.java:70)
        at 
org.apache.crunch.io.CompositePathIterable$2.<init>(CompositePathIterable.java:80)
        at 
org.apache.crunch.io.CompositePathIterable.iterator(CompositePathIterable.java:78)
        at com.example.load.BulkLoader$1.run(BulkLoadCommand.java:109)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:680)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to