[ 
https://issues.apache.org/jira/browse/HDFS-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498265#comment-13498265
 ] 

Uma Maheswara Rao G commented on HDFS-4190:
-------------------------------------------

{code}
it comes out to basically the same speed as large chunked read() calls, and 
sometimes slower in multi-threaded environments
{code}
I am curious to know. You have already written the code for unmapping the 
mmapped buffers (we have to write JNI call for that right. Currently I just 
used cleaner class from Sun package to clean) ? If we don't clean them, I have 
seen bad behavior and after some time OOM errors as native memory won't be 
cleaned until fullGC or you used in C code directly. I have tried mmapped 
buffers in write path also. I have tried with 36 threads, I got around ~20% 
improvement. I did not try increasing them, not sure if it goes bad if i 
increase them.(here other improvement involved that, I have allocated complete 
block once, added the packet content to that buffers. So, allocating 
consecutive memory location would improve somewhat I guess like using 
fallocate). CLeaning that memmapped buffer is again heavy operation. So, I 
cleaned them in asynchronous thread. That gives me that improvement. If I clean 
them sequentially then result is in negative :( . 

I have one question here: 
fadvise options are global in DFS right? so, when I have random small reads, 
then readahead may create unnecessary load data? (I am not expert in fadvise 
options internal behaviors :-) )


{quote}
Turns out I was remembering wrong - that's what I get for going on JIRA before 
I've had any coffee! I was thinking of HDFS-3529, but that's for the write 
path. A similar improvement could be made on the read path to avoid a memcpy 
for the non-transferTo case.
{quote}
Ok. Yes, we can do similar.

                
> Read complete block into memory once in BlockScanning and reduce concurrent 
> disk access
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-4190
>                 URL: https://issues.apache.org/jira/browse/HDFS-4190
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 3.0.0
>            Reporter: Uma Maheswara Rao G
>
> When we perform bulk write operations to DFS we observed that block scan is 
> one bottleneck for concurrent disk access.
> To see real load on disks, keep single data node and local client flushing 
> data to DFS.
> When we switch off block scanning we have seen >10% improvement. I will 
> update real figures in comment.
> Even though I am doing only write operation, implicitly there will be a read 
> operation for each block due to block scanning. Next scan will happen only 
> after 21 days, but once scan will happen after adding the block. This will be 
> the concurrent access to disks.
> Other point to note is that, we will read the block, packet by packet in 
> block scanning as well. We know that, we have to read&scan complete block, 
> so, it may be correct to load complete block once and do checksums 
> verification for that data?
> I tried with MemoryMappedBuffers:
> mapped the complete block once in blockScanning and does the checksum 
> verification with that. Seen good improvement in that bulk write scenario.
> But we don't have any API to clean the mapped buffer immediately. With my 
> experiment I just used, Cleaner class from sun package. That will not be 
> correct to use in production. So, we have to write JNI call to clean that 
> mmapped buffer.
> I am not sure I missed something here. please correct me If i missed some 
> points.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to