[
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881777#comment-15881777
]
Manoj Govindassamy commented on HDFS-10999:
-------------------------------------------
Based on the discussions and consensus above, my understanding is that we want
to go about having tools/UI reporting Replicated and EC Blocks separately.
1. {{fsck}} command already reports Replicated blocks and EC blocks separately.
Verified the reporting under EC blocks and they look good to me. Not planning
to add more changes to {{fsck}} for now w.r.t this jira.
{noformat}
# hdfs fsck /
Connecting to namenode via http://127.0.0.1:50002/fsck?ugi=manoj&path=%2F
FSCK started by manoj (auth:SIMPLE) from /127.0.0.1 for path / at Thu Feb 23
15:21:06 PST 2017
Status: HEALTHY
Number of data-nodes: 3
Number of racks: 1
Total dirs: 5
Total symlinks: 0
Replicated Blocks:
Total size: 10240000 B
Total files: 5
Total blocks (validated): 5 (avg. block size 2048000 B)
Minimally replicated blocks: 5 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Erasure Coded Block Groups:
Total size: 10240000 B
Total files: 5
Total block groups (validated): 5 (avg. block group size 2048000 B)
Minimally erasure-coded block groups: 5 (100.0 %)
Over-erasure-coded block groups: 0 (0.0 %)
Under-erasure-coded block groups: 0 (0.0 %)
Unsatisfactory placement block groups: 0 (0.0 %)
Default ecPolicy: RS-DEFAULT-6-3-64k
Average block group size: 3.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0 (0.0 %)
FSCK ended at Thu Feb 23 15:21:06 PST 2017 in 15 milliseconds
The filesystem under path '/' is HEALTHY
{noformat}
----
2. {{dfsadmin -report}} command is not reporting EC blocks separately. Today,
report command gets stats from {{FSNameSystem#getStats()}} which is the
combined stats for both Replicated and EC Blocks.
* {noformat}
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
{noformat}
*Proposal:*
-- For backward compatibility reasons, let the current
{{FSNameSystem#getStats()}} be as is, and will continue to return cumulative
stats for all Block combined.
-- Introduce {{FSNameSystem#getReplicatedBlockStats()}} and
{{FSNameSystem#getECBlockStats()}} to capture Replicated and EC Blocks stats
separately.
-- In the report {{Under replicated blocks}}, {{Blocks with corrupt replicas}},
{{Missing blocks}} will only show stats for Replicated blocks (compared to the
current cumulative numbers)
-- New fields like {{Under erasure coded block groups}}, {{Corrupt erasure
coded block groups}}, {{Missing erasure coded block groups}} will be added to
the report command which contains stats for Erasure coded blocks only.
* {noformat}
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Under erasure coded blocks groups: 0
Erasure coded blocks with corrupt replicas: 0
Missing erasure coded blocks: 0
Pending deletion erasure coded blocks: 0
{noformat}
----
3. For the WebUI, in order to report Erasure Coded blocks details
{{FSNameSysatemMBean}} need to be extended.
-- Currently we have the following ones reported under Summary section in
NameNode UI, but they will be including both Replicated + EC stats {noformat}
Number of Under-Replicated Blocks
Number of Blocks Pending Deletion
{noformat}
-- [~lewuathe] has already
[proposed|https://issues.apache.org/jira/secure/attachment/12852567/Screen%20Shot%202017-02-14%20at%2022.43.57.png]
a patch for adding Total EC blocks and its size under HDFS-8196.
*Proposal:*
-- Display the Replicated and EC block stats separately in the Summary section
NameNode UI. No cumulative stats. {noformat}
Number of Under-Replicated Blocks
Number of Blocks Pending Deletion
Number of Under-Erasure-Coded Blocks Groups
Number of Erasure Coded Blocks Pending Deletion
{noformat}
[~andrew.wang], [~aw], [~tasanuma0829], [~jojochuang], [~yuanbo], Can you
please share your thoughts on the above proposals ? Thanks.
> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0-alpha1
> Reporter: Wei-Chiu Chuang
> Assignee: Manoj Govindassamy
> Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic
> term "low redundancy" to the old-fashioned "under replicated". But this term
> is still being used in messages in several places, such as web ui, dfsadmin
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]