[ https://issues.apache.org/jira/browse/FLUME-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yan Jian updated FLUME-2973: ---------------------------- Attachment: FLUME-2973-1.patch I made smallest modification ([^FLUME-2973-1.patch]) to escape from inconsistent code format and unrelated code smells in original code. It just breaks {{circular wait}} condition by reversing the lock sequence in rolling thread, avoiding lock transfer. Turn to {{trunk}} branch, such classes ({{BucketWriter}}, {{HDFSEventSink}}, etc.) are reformatted, but still contain several code smells. # Inappropriate naming, such as {{org.apache.flume.sink.hdfs.BucketWriter.closed}}. # Coupling between classes {{BucketWriter}} and {{HDFSEventSink}}. # Meaningless duplicate closing of a bucket writer. IMHO, it is necessary to refactor classes in {{org.apache.flume.sink.hdfs}} package. > Deadlock in hdfs sink > --------------------- > > Key: FLUME-2973 > URL: https://issues.apache.org/jira/browse/FLUME-2973 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: v1.7.0 > Reporter: Denes Arvay > Assignee: Denes Arvay > Priority: Critical > Labels: hdfssink > Attachments: FLUME-2973-1.patch, FLUME-2973.patch > > > Automatic close of BucketWriters (when open file count reached > {{hdfs.maxOpenFiles}}) and the file rolling thread can end up in deadlock. > When creating a new {{BucketWriter}} in {{HDFSEventSink}} it locks > {{HDFSEventSink.sfWritersLock}} and the {{close()}} called in > {{HDFSEventSink.sfWritersLock.removeEldestEntry}} tries to lock the > {{BucketWriter}} instance. > On the other hand if the file is being rolled in > {{BucketWriter.close(boolean)}} it locks the {{BucketWriter}} instance first > and in the close callback it tries to lock the {{sfWritersLock}}. > The chances for this deadlock is higher when the {{hdfs.maxOpenFiles}}'s > value is low (1). > Script to reproduce: > https://gist.github.com/adenes/96503a6e737f9604ab3ee9397a5809ff > (put to > {{flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs}}) > Deadlock usually occurs before ~30 iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)