[jira] [Comment Edited] (TEZ-733) Add OrderedWordCount to the test suite and refactor it to use best practices for Tez API

Hitesh Shah (JIRA) Mon, 19 May 2014 11:43:35 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002147#comment-14002147
 ]


Hitesh Shah edited comment on TEZ-733 at 5/19/14 6:40 PM:
----------------------------------------------------------

[~rekhajoshm] Thanks for taking this up.

Some comments:
   - I believe tez-tests already depends ( or should be ) on 
tez-mapreduce-examples so it would be better to add this test into that module 
either as part of TestMRRJobsDAGApi or a separate file.
   - The jira title seems to talk about refactoring OrderedWordCount. I will 
create a separate jira for the refactor and keep this one for the unit test. 

Comment on the test itself
     - there is a big section of commented out code which should be removed. 
     - instead of just relying on the exit code, we should also be verifying 
the data. I see the commented out section had some comments in place for this. 
A simple approach to this could be to generate the data as part of the test 
itself and check the output file for the expected result. You can use the 
FileSystem apis to create a file, write data into it and read the output files. 
A simple input with multiple files - each file having a set of words ( one word 
per line ) should be used and that would end up generating a single output file 
( with one word per line ).




was (Author: hitesh):
[~rekhajoshm] Thanks for taking this up.

Some comments:
   - I believe tez-tests already depends ( or should be ) on 
tez-mapreduce-examples so it would be better to add this test into that module 
either as part of TestMRRJobsDAGApi or a separate file.
   - The jira title seems to talk about refactoring OrderedWordCount. I will 
create a separate jira for the refactor and keep this one for the unit test. 
   - Comment on the test itself
     - there is a big section of commented out code which should be removed. 
     - instead of just relying on the exit code, we should also be verifying 
the data. I see the commented out section had some comments in place for this. 
A simple approach to this could be to generate the data as part of the test 
itself and check the output file for the expected result. You can use the 
FileSystem apis to create a file, write data into it and read the output files. 
A simple input with multiple files - each file having a set of words ( one word 
per line ) should be used and that would end up generating a single output file 
( with one word per line ).



> Add OrderedWordCount to the test suite and refactor it to use best practices 
> for Tez API
> ----------------------------------------------------------------------------------------
>
>                 Key: TEZ-733
>                 URL: https://issues.apache.org/jira/browse/TEZ-733
>             Project: Apache Tez
>          Issue Type: Test
>    Affects Versions: 0.5.0
>            Reporter: Bikas Saha
>            Assignee: Rekha Joshi
>            Priority: Blocker
>              Labels: patch, test
>         Attachments: TEZ-733.1.patch
>
>
> Some real jobs running in miniCluster would help catch issues like TEZ-732 
> faster. Also, with local mode, these would run fast.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (TEZ-733) Add OrderedWordCount to the test suite and refactor it to use best practices for Tez API

Reply via email to