In general, the DN does not perform reads from files under a big lock. We only need the lock for protecting the replica map and some of the block state. This lock hasn't really been a big problem in the past and I would hesitate to add complexity here (although I haven't thought about it that hard at all, so maybe I'm wrong!)
Are you sure that you are not hitting HDFS-7489? In general, the client normally does some readahead of a few kb to avoid swamping the DN with tons of tiny requests. Tons of tiny requests is a bad idea for many other reasons (RPC overhead, seek overhead, etc. etc.) You can also look into using short-circuit reads to avoid the DataNode overhead altogether for local reads, which a lot of high-performance systems do. regards, Colin On Sat, Feb 14, 2015 at 10:43 PM, Sukunhui (iEBP) <sukun...@huawei.com> wrote: > I have a cluster writes/reads/deletes lots of small files. > I dump the stack of one Datenode and found out that Datanode has more than > 100+ sessions for reading/writing blocks. 100+ DataXceiver threads waiting to > lock <0x00007f9b26ce9530> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > > I find that DsDatasetImpl.java and ReplicaMap.java use a lot of > `synchronized` keyword for synchronization. It’s horrible. > First, locking for every reading is unnecessary, and deceases concurrency. > Second, Java monitors (synchronized/await/notify/notifyAll) are non-fair. > (http://stackoverflow.com/questions/11275699/synchronized-release-order), It > will causes many dfsclient timeout. > > I’m thinking we can use ReentrantReadWriteLock for synchronization. What do > you guys think?