[jira] [Updated] (HDFS-10999) Introduce separate stats for Replicated and Erasure Coded Blocks apart from the current Aggregated stats

Manoj Govindassamy (JIRA) Tue, 09 May 2017 11:53:30 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Manoj Govindassamy updated HDFS-10999:
--------------------------------------
    Attachment: HDFS-10999.03.patch

Thanks [~tasanuma0829] and [~andrew.wang] for the detailed review. Attaching 
v03 patch to address the following comments and bunch of renamings. While you 
review this revision, i will look at more tests that can be added to verify the 
missing cases.

bq. BlockManagerSafeMode: How about using LongAccumulator for 
numberOfBytesInFutureBlocks, too?
Done. numberOfBytesInFutureBlocks is no more needed as it can be derived from 
bytesInFutureReplicatedBlocks and bytesInFutureStripedBlocks. 
numberOfBytesInFutureBlocks is now removed.

bq. CorruptReplicasMap: decrementBlockStat be included in the if statement?Done
    
bq. It seems package private is enough for new methods 
getCorruptReplicatedBlocksStat and getCorruptStripedBlocksStat.
 Done 

bq. LowRedundancyBlocks: Looks like corruptReplicatedOneBlocks is same as 
corruptReplOneBlocks. How about reusing corruptReplOneBlocks instead of 
calculating corruptReplicatedOneBlocks?
Done. To be consistent with newly introdcued stats, removed 
corruptReplOneBlocks and used corruptReplicatedOneBlocks

bq. InvalidateBlocks: Maybe I should have asked earlier, but if it is not much 
trouble for you, how about doing InvalidateBlocks-related work as a follow-on 
task?
Simplified the code by using two maps instead of 1. Still there are quite a few 
new changes. Tried reverting this file change, but it ended up making the patch 
inconsistent for few stats. Please take a look at the new version one more time.

bq. ReplicatedBlocksStatsMBean: the equivalent of getCorruptBlocks is now 
called getCorruptReplicaBlocksStat. Why not getCorruptBlocksStat instead?
Done. Renamed the methods as suggested.
    
bq. Should we rename getUnderReplicatedBlocksStat to 
getLowRedundancyReplicatedBlocksStat or similar to standardize with the EC 
naming? Since we deprecated getUnderReplicatedBlocks in favor of 
getLowRedundancyBlocks, it seems odd to bring this same name back here.
Done. Renamed this to getLowRedundancyBlocksStat.

bq. ECBlockGroupsStatusMBean: the noun is an "EC block group", should we name 
these e.g. getLowRedundancyECBlockGroupsStat, getCorruptECBlockGroupsStat, 
etc.? We could also shorten from "ECBlockGroup" to just "BlockGroup" if you 
think the MBean name by itself is sufficient documentation.
Done. Followed the approach used in fsck as suggested by Takanobu and renamed 
all these methods to have "ECBlockGroups"

bq. Misc: Any reason you chose to use LongAccumulator rather than LongAdder 
everywhere?
Done. LongAdder internally uses LongAccumulator. Anyway, switched to LongAdder 
as it is far easier to work with than the other.

bq. DFSClient and DistributedFileSystem aren't public APIs, so we don't need to 
preserve getUnderReplicatedBlocksCount if these methods are unused. Can move 
DFSAdmin over to using the new call.
Done.

bq. CorruptReplicasMap: Nit: Please put the new increment and decrement 
functions next to each other for clarity
Done

bq. In this block of code, should the decrement be moved inside the isEmpty 
case? Testing ?
Done. Tests yet to be added for this particular case.

bq. InvalidateBlocks: A lot of the new code is because of the indexing into the 
new array in the map and maintaining the separate counts. If we instead add 
another map, e.g. nodeToBlockGroups, we could avoid this.
Done.

bq. Can we simplify the limiting code on 297 ..?
DONE

bq. LowRedundancyBlocks: Rename from StripedBlocks to instead 
StripedBlockGroups?
Done. To be consistent with other places, I used EC prefix instead of Striped, 
but also included BlockGroups. 

bq. We're duplicating the corruptReplOneBlocks if statement, let's either move 
handling corruptReplicatedOneBlocks out of incrementBlockStat, or also 
increment corruptReplOneBlocks inside incrementBlockStat. Same comment for 
remove
Removed corruptReplOneBlocks as it is handled by corruptReplicatedOneBlocks 
fully.


> Introduce separate stats for Replicated and Erasure Coded Blocks apart from 
> the current Aggregated stats
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>              Labels: hdfs-ec-3.0-nice-to-have, supportability
>         Attachments: HDFS-10999.01.patch, HDFS-10999.02.patch, 
> HDFS-10999.03.patch
>
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-10999) Introduce separate stats for Replicated and Erasure Coded Blocks apart from the current Aggregated stats

Reply via email to