I have a cluster writes/reads/deletes lots of small files. I dump the stack of one Datenode and found out that Datanode has more than 100+ sessions for reading/writing blocks. 100+ DataXceiver threads waiting to lock <0x00007f9b26ce9530> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
I find that DsDatasetImpl.java and ReplicaMap.java use a lot of `synchronized` keyword for synchronization. It’s horrible. First, locking for every reading is unnecessary, and deceases concurrency. Second, Java monitors (synchronized/await/notify/notifyAll) are non-fair. (http://stackoverflow.com/questions/11275699/synchronized-release-order), It will causes many dfsclient timeout. I’m thinking we can use ReentrantReadWriteLock for synchronization. What do you guys think?