[jira] [Commented] (HDFS-4190) Read complete block into memory once in BlockScanning and reduce concurrent disk access

Uma Maheswara Rao G (JIRA) Thu, 15 Nov 2012 10:56:14 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498242#comment-13498242
 ]


Uma Maheswara Rao G commented on HDFS-4190:
-------------------------------------------

Yes Todd, I have already tried #a, but complete block as single packet. Using 
memory mapped buffers gave better results than that.
But in both cases there is an improvement. I think we can do like local read 
with using direct byte bytebuffer with bigger size(may be equal to block size). 
So, here we can even reduce the buffer cleanup step when compared to memmapped 
buffer usage, we can reuse same buffer as we will do scanning sequentially with 
single thread.

{quote}
faster CRC verification
{quote}
doing fast CRC may improve scanning speed. But what I am looking the overhead 
here is seeks because currently it is loading 4096 size I think. correct me if 
I misunderstood your point.



                
> Read complete block into memory once in BlockScanning and reduce concurrent 
> disk access
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-4190
>                 URL: https://issues.apache.org/jira/browse/HDFS-4190
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 3.0.0
>            Reporter: Uma Maheswara Rao G
>
> When we perform bulk write operations to DFS we observed that block scan is 
> one bottleneck for concurrent disk access.
> To see real load on disks, keep single data node and local client flushing 
> data to DFS.
> When we switch off block scanning we have seen >10% improvement. I will 
> update real figures in comment.
> Even though I am doing only write operation, implicitly there will be a read 
> operation for each block due to block scanning. Next scan will happen only 
> after 21 days, but once scan will happen after adding the block. This will be 
> the concurrent access to disks.
> Other point to note is that, we will read the block, packet by packet in 
> block scanning as well. We know that, we have to read&scan complete block, 
> so, it may be correct to load complete block once and do checksums 
> verification for that data?
> I tried with MemoryMappedBuffers:
> mapped the complete block once in blockScanning and does the checksum 
> verification with that. Seen good improvement in that bulk write scenario.
> But we don't have any API to clean the mapped buffer immediately. With my 
> experiment I just used, Cleaner class from sun package. That will not be 
> correct to use in production. So, we have to write JNI call to clean that 
> mmapped buffer.
> I am not sure I missed something here. please correct me If i missed some 
> points.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4190) Read complete block into memory once in BlockScanning and reduce concurrent disk access

Reply via email to