[
https://issues.apache.org/jira/browse/HDFS-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430007#comment-15430007
]
Yiqun Lin commented on HDFS-10778:
----------------------------------
Hi, [~ajisakaa], now I working on this jira and I found some other problems
here.
{quote}
so would you add a new option to optimize the output? '-h' is good for me.
{quote}
The option '-h' is already used for {{-help}} in hdfs oiv command. So it seems
we would the other option, now I use a new option {{-format}} to instead of
that.
I found another bug when I tested the new option in class
{{OfflineImageViewer}}, it is missing the following code in method
{{OfflineImageViewer#buildOptions}}:
{code}
options.addOption("maxSize", true, "");
options.addOption("step", true, "");
{code}
Then leads the {{ParseException}} being threw when doing the
{{parser.parse(options, args)}}.
{code}
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option:
-maxSize
{code}
The test output in my local env:
{code}
with -format option:
Size Range NumFiles
(0 B, 8 B] 12
totalFiles = 12
totalDirectories = 9
totalBlocks = 12
totalSpace = 12
maxFileSize = 1
without -format option:
Size NumFiles
8 12
totalFiles = 12
totalDirectories = 9
totalBlocks = 12
totalSpace = 12
maxFileSize = 1
{code}
Finally, attach a new patch for this, thanks for the review.
> Optimize the output result of FileDistribution processor in hdfs oiv command
> ----------------------------------------------------------------------------
>
> Key: HDFS-10778
> URL: https://issues.apache.org/jira/browse/HDFS-10778
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: tools
> Affects Versions: 2.7.1
> Reporter: Yiqun Lin
> Assignee: Yiqun Lin
> Priority: Minor
> Attachments: HDFS-10778.001.patch, HDFS-10778.002.patch
>
>
> Now It's not directly to understand the output result of the
> {{FileDistribution}} processor that in hdfs oiv command for users. For
> example, this is a original output:
> {code}
> Size NumFiles
> 0 22556
> 1048576 404971
> 2097152 29259
> 3145728 16937
> 4194304 9197
> 5242880 6889
> 6291456 4930
> 7340032 4070
> 8388608 299384
> 9437184 274623
> {code}
> Two aspects make that hard to understand for users.
> First, the size column just showed as the number in byte, it's not readable
> here. The better way is showed with a binary prefix.
> Second, the size column would be better to showed as a size range. It will
> let users know the value in {{NumFiles}} column was counted from A size to B
> size.
> The expected output result should be this:
> {code}
> Size Range NumFiles
> (0 B, 0 B] 1666332
> (0 B, 1 M] 778473
> (1 M, 2 M] 35125
> (2 M, 3 M] 13978
> (3 M, 4 M] 10158
> (4 M, 5 M] 6970
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]