[ https://issues.apache.org/jira/browse/HDFS-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uma Maheswara Rao G resolved HDFS-4190. --------------------------------------- Assignee: Uma Maheswara Rao G Resolution: Won't Fix We have HDFS caching feature in-place, if one wants to cache, they can just use that feature. Resolving this now. > Read complete block into memory once in BlockScanning and reduce concurrent > disk access > --------------------------------------------------------------------------------------- > > Key: HDFS-4190 > URL: https://issues.apache.org/jira/browse/HDFS-4190 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0-alpha1 > Reporter: Uma Maheswara Rao G > Assignee: Uma Maheswara Rao G > Priority: Major > > When we perform bulk write operations to DFS we observed that block scan is > one bottleneck for concurrent disk access. > To see real load on disks, keep single data node and local client flushing > data to DFS. > When we switch off block scanning we have seen >10% improvement. I will > update real figures in comment. > Even though I am doing only write operation, implicitly there will be a read > operation for each block due to block scanning. Next scan will happen only > after 21 days, but once scan will happen after adding the block. This will be > the concurrent access to disks. > Other point to note is that, we will read the block, packet by packet in > block scanning as well. We know that, we have to read&scan complete block, > so, it may be correct to load complete block once and do checksums > verification for that data? > I tried with MemoryMappedBuffers: > mapped the complete block once in blockScanning and does the checksum > verification with that. Seen good improvement in that bulk write scenario. > But we don't have any API to clean the mapped buffer immediately. With my > experiment I just used, Cleaner class from sun package. That will not be > correct to use in production. So, we have to write JNI call to clean that > mmapped buffer. > I am not sure I missed something here. please correct me If i missed some > points. > Thoughts? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org