[ https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050538#comment-14050538 ]
Vladimir Rodionov commented on HBASE-7336: ------------------------------------------ I am looking into this stuff now, trying to figure out how to make parallel scan on a region more efficient. The code in AbstractFSReader looks dangerous and does not provide any benefits in terms of MT performance. {code} protected int readAtOffset(FSDataInputStream istream, byte[] dest, int destOffset, int size, boolean peekIntoNextBlock, long fileOffset, boolean pread) throws IOException { if (peekIntoNextBlock && destOffset + size + hdrSize > dest.length) { // We are asked to read the next block's header as well, but there is // not enough room in the array. throw new IOException("Attempted to read " + size + " bytes and " + hdrSize + " bytes of next header into a " + dest.length + "-byte array at offset " + destOffset); } if (!pread && streamLock.tryLock()) { // Seek + read. Better for scanning. try { istream.seek(fileOffset); long realOffset = istream.getPos(); if (realOffset != fileOffset) { throw new IOException("Tried to seek to " + fileOffset + " to " + "read " + size + " bytes, but pos=" + realOffset + " after seek"); } if (!peekIntoNextBlock) { IOUtils.readFully(istream, dest, destOffset, size); return -1; } // Try to read the next block header. if (!readWithExtra(istream, dest, destOffset, size, hdrSize)) return -1; } finally { streamLock.unlock(); } } else { // Positional read. Better for random reads; or when the streamLock is already locked. int extraSize = peekIntoNextBlock ? hdrSize : 0; int ret = istream.read(fileOffset, dest, destOffset, size + extraSize); if (ret < size) { throw new IOException("Positional read of " + size + " bytes " + "failed at offset " + fileOffset + " (returned " + ret + ")"); } if (ret == size || ret < size + extraSize) { // Could not read the next block's header, or did not try. return -1; } } assert peekIntoNextBlock; return Bytes.toInt(dest, destOffset + size + BlockType.MAGIC_LENGTH) + hdrSize; } {code} Positional reads in FSInputStream (DFSInputStream) are heavily synchronized. It is lock on stream than seek and read, unlock. Here is the code for FSInputStream: {code} @Override public int read(long position, byte[] buffer, int offset, int length) throws IOException { synchronized (this) { long oldPos = getPos(); int nread = -1; try { seek(position); nread = read(buffer, offset, length); } finally { seek(oldPos); } return nread; } } {code} DFSInputStream extends FSInputStream but does not override the above method. Taking into account that code is synchronized, it is hard to explain observed performance improvement published in this JIRA. > HFileBlock.readAtOffset does not work well with multiple threads > ---------------------------------------------------------------- > > Key: HBASE-7336 > URL: https://issues.apache.org/jira/browse/HBASE-7336 > Project: HBase > Issue Type: Sub-task > Components: Performance > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Critical > Fix For: 0.94.4, 0.95.0 > > Attachments: 7336-0.94.txt, 7336-0.96.txt > > > HBase grinds to a halt when many threads scan along the same set of blocks > and neither read short circuit is nor block caching is enabled for the dfs > client ... disabling the block cache makes sense on very large scans. > It turns out that synchronizing in istream in HFileBlock.readAtOffset is the > culprit. -- This message was sent by Atlassian JIRA (v6.2#6252)