[jira] [Resolved] (HDFS-4190) Read complete block into memory once in BlockScanning and reduce concurrent disk access

Uma Maheswara Rao G (Jira) Mon, 14 Aug 2023 15:08:10 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uma Maheswara Rao G resolved HDFS-4190.
---------------------------------------
      Assignee: Uma Maheswara Rao G
    Resolution: Won't Fix

We have HDFS caching feature in-place, if one wants to cache, they can just use 
that feature. Resolving this now.

> Read complete block into memory once in BlockScanning and reduce concurrent 
> disk access
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-4190
>                 URL: https://issues.apache.org/jira/browse/HDFS-4190
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>
> When we perform bulk write operations to DFS we observed that block scan is 
> one bottleneck for concurrent disk access.
> To see real load on disks, keep single data node and local client flushing 
> data to DFS.
> When we switch off block scanning we have seen >10% improvement. I will 
> update real figures in comment.
> Even though I am doing only write operation, implicitly there will be a read 
> operation for each block due to block scanning. Next scan will happen only 
> after 21 days, but once scan will happen after adding the block. This will be 
> the concurrent access to disks.
> Other point to note is that, we will read the block, packet by packet in 
> block scanning as well. We know that, we have to read&scan complete block, 
> so, it may be correct to load complete block once and do checksums 
> verification for that data?
> I tried with MemoryMappedBuffers:
> mapped the complete block once in blockScanning and does the checksum 
> verification with that. Seen good improvement in that bulk write scenario.
> But we don't have any API to clean the mapped buffer immediately. With my 
> experiment I just used, Cleaner class from sun package. That will not be 
> correct to use in production. So, we have to write JNI call to clean that 
> mmapped buffer.
> I am not sure I missed something here. please correct me If i missed some 
> points.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-4190) Read complete block into memory once in BlockScanning and reduce concurrent disk access

Reply via email to