we should automate checks of the output of the sort example program
-------------------------------------------------------------------

                 Key: HADOOP-877
                 URL: https://issues.apache.org/jira/browse/HADOOP-877
             Project: Hadoop
          Issue Type: Improvement
          Components: test
    Affects Versions: 0.10.0
            Reporter: Owen O'Malley
         Assigned To: Nigel Daley


Since we are using the sort example program to do smoke tests on new versions 
of Hadoop, it would be nice to have some checks of the output. The checks that 
I've considered:
  1. count the number of records on input & output
  2. compute the md5 of each key/value and xor across all of the rows
  3. use a map/reduce job to merge the input and output directories and make 
sure that each key/value appears on both input and output

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to