[ https://issues.apache.org/jira/browse/TEZ-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002147#comment-14002147 ]
Hitesh Shah edited comment on TEZ-733 at 5/19/14 6:40 PM: ---------------------------------------------------------- [~rekhajoshm] Thanks for taking this up. Some comments: - I believe tez-tests already depends ( or should be ) on tez-mapreduce-examples so it would be better to add this test into that module either as part of TestMRRJobsDAGApi or a separate file. - The jira title seems to talk about refactoring OrderedWordCount. I will create a separate jira for the refactor and keep this one for the unit test. Comment on the test itself - there is a big section of commented out code which should be removed. - instead of just relying on the exit code, we should also be verifying the data. I see the commented out section had some comments in place for this. A simple approach to this could be to generate the data as part of the test itself and check the output file for the expected result. You can use the FileSystem apis to create a file, write data into it and read the output files. A simple input with multiple files - each file having a set of words ( one word per line ) should be used and that would end up generating a single output file ( with one word per line ). was (Author: hitesh): [~rekhajoshm] Thanks for taking this up. Some comments: - I believe tez-tests already depends ( or should be ) on tez-mapreduce-examples so it would be better to add this test into that module either as part of TestMRRJobsDAGApi or a separate file. - The jira title seems to talk about refactoring OrderedWordCount. I will create a separate jira for the refactor and keep this one for the unit test. - Comment on the test itself - there is a big section of commented out code which should be removed. - instead of just relying on the exit code, we should also be verifying the data. I see the commented out section had some comments in place for this. A simple approach to this could be to generate the data as part of the test itself and check the output file for the expected result. You can use the FileSystem apis to create a file, write data into it and read the output files. A simple input with multiple files - each file having a set of words ( one word per line ) should be used and that would end up generating a single output file ( with one word per line ). > Add OrderedWordCount to the test suite and refactor it to use best practices > for Tez API > ---------------------------------------------------------------------------------------- > > Key: TEZ-733 > URL: https://issues.apache.org/jira/browse/TEZ-733 > Project: Apache Tez > Issue Type: Test > Affects Versions: 0.5.0 > Reporter: Bikas Saha > Assignee: Rekha Joshi > Priority: Blocker > Labels: patch, test > Attachments: TEZ-733.1.patch > > > Some real jobs running in miniCluster would help catch issues like TEZ-732 > faster. Also, with local mode, these would run fast. -- This message was sent by Atlassian JIRA (v6.2#6252)