[ 
https://issues.apache.org/jira/browse/PIG-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1899:
----------------------------

    Attachment: e2e.patch

The attached patch contains a testing tool developed by the Yahoo Pig team to 
handle end to end testing.  In it's current form it is not complete, nor yet 
usable.  In particular I had to remove all of the code dealing with deploying 
the test cluster and database, since that was Yahoo specific.  The patch is 
placed here as a start.

The patch contains two separate pieces, a test harness and Pig specific 
components.  The test harness is a fairly simple set of Perl modules that 
export one main class, TestDriver.  Users must extend this class and implement 
its three abstract methods, runTest, generateBenchmark, and compare.  runTest 
runs the test.  generateBenchmark generates results from the source of truth.  
compare compares the outputs of runTest and generateBenchmark and decides if 
the test succeeded or failed.

Tests for the harness are specified via a Perl data structure, generally stored 
in a separate file.  This data structure is a hash that contains an array of 
groups.  Each group is a hash that contains an array of tests.  Each test is a 
hash.  At each hash step along the way users are free to define keys in the 
hash.  These keys are then used by their implementation of TestDriver to run 
the tests, generate the benchmarks, and compare the results.

The Pig specific portions of this patch provide an implementation of TestDriver 
for Pig called TestDriverPig.  This implementation takes a Pig Latin script in 
each test and runs it on a grid specified as part of the test invocation.  For 
a benchmark it takes a SQL query which is run against a database.  For cases 
where equivalent output cannot be generated by SQL pre-generated results can be 
used.  The compare function sorts the output of each and calculates an md5 
checksum to see if the results are the same.

Work remaining to be done is mainly in the area of deployment.  Work needs to 
be done to deploy a hadoop instance, construct a database, populate both with 
the test data, and then run the tests.


> Pig needs a tool for doing end to end testing efficiently
> ---------------------------------------------------------
>
>                 Key: PIG-1899
>                 URL: https://issues.apache.org/jira/browse/PIG-1899
>             Project: Pig
>          Issue Type: Test
>          Components: tools
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: e2e.patch
>
>
> Pig currently uses junit for all testing.  junit is good for unit tests, but 
> limited for end to end and integration testing.
> Building an end to end test in junit is cumbersome (a lot of setup and such 
> to do using MiniCluster).  Given that expected results must be known 
> beforehand and hand crafted they must be kept very small, usually ten or less 
> rows.  This does not lead to realistic testing scenarios.
> A test tool is needed that allows the test developer to write a Pig Latin 
> script and specify a source of truth against which to test the results of 
> running this Pig Latin script.  A database or a previous version of Pig can 
> then be used as that source of truth.  This will allow developers to quickly 
> add new tests that return more than trivial results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to