[ 
https://issues.apache.org/jira/browse/HDFS-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693205#comment-16693205
 ] 

Adam Antal commented on HDFS-13960:
-----------------------------------

Thanks for the patch [~ljain], good work!
 I have some minor remarks to patch v001:
 - CLI test fails because in {{testConf.xml:715}}
{code:xml}
<comparator>
  <type>RegexpComparator</type>
  <expected-output>^-checksum <src> \.\.\. :\s*</expected-output>
</comparator>
{code}
should be modified since the output of the checksum command is modified as well.

 - I think in {{TestDFSShell.java:1143}}
{code:java}
    ...
    } finally {
      if (printStream != null) {
        System.setOut(printStream);
      }
    }
{code}
If {{System.out}} was null at the beginning, then you leave the created 
{{PrintStream}} as {{System.out}} - I think this is not good for the following 
tests. I believe the condition here is unnecessary, because if somehow the 
{{System.out}} was null I'd rather throw an assertion error. Also it can only 
be null, if {{System.out}} was null at the beginning of the test, so maybe an 
assertion there is justified.

 - In {{Display.java:203}}
{code:java}
  FileChecksum checksum = item.fs.getFileChecksum(item.path);
  if (checksum == null) {
    out.printf("%s\tNONE\t%n", item.toString());
  } else {
  ...
{code}
In my opinion if the {{-v}} was provided intentionally, even if the checksum is 
null there should be displayed the blocksize as well. Also, we should handle 
the case where the FileStatus object is null in order to avoid NPE: so we 
should have an Unknown or some text displayed just like "NONE" when the 
checksum is not found.

 - That one checkstyle warning can be ignored I guess (input parameters are 
initialized that way).

> hdfs dfs -checksum command should optionally show block size in output
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13960
>                 URL: https://issues.apache.org/jira/browse/HDFS-13960
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Adam Antal
>            Assignee: Lokesh Jain
>            Priority: Minor
>         Attachments: HDFS-13960.001.patch
>
>
> The hdfs checksum command computes the checksum in a distributed manner, 
> which would take into account the block size. In other words, the block size 
> determines how the file will be broken up.
> Therefore it can happen that the checksum command produces different outputs 
> for the exact same file only differing in the block size: 
> checksum(fileABlock1) + checksum(fileABlock2) != checksum(fileABlock1 + 
> fileABlock2)
> I suggest to add an option to the hdfs dfs -checksum command which would 
> displays the block size along with the output, and that could also be helpful 
> in some other cases where this piece of information is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to