[
https://issues.apache.org/jira/browse/PIG-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-1899:
----------------------------
Attachment: e2e.patch
The attached patch contains a testing tool developed by the Yahoo Pig team to
handle end to end testing. In it's current form it is not complete, nor yet
usable. In particular I had to remove all of the code dealing with deploying
the test cluster and database, since that was Yahoo specific. The patch is
placed here as a start.
The patch contains two separate pieces, a test harness and Pig specific
components. The test harness is a fairly simple set of Perl modules that
export one main class, TestDriver. Users must extend this class and implement
its three abstract methods, runTest, generateBenchmark, and compare. runTest
runs the test. generateBenchmark generates results from the source of truth.
compare compares the outputs of runTest and generateBenchmark and decides if
the test succeeded or failed.
Tests for the harness are specified via a Perl data structure, generally stored
in a separate file. This data structure is a hash that contains an array of
groups. Each group is a hash that contains an array of tests. Each test is a
hash. At each hash step along the way users are free to define keys in the
hash. These keys are then used by their implementation of TestDriver to run
the tests, generate the benchmarks, and compare the results.
The Pig specific portions of this patch provide an implementation of TestDriver
for Pig called TestDriverPig. This implementation takes a Pig Latin script in
each test and runs it on a grid specified as part of the test invocation. For
a benchmark it takes a SQL query which is run against a database. For cases
where equivalent output cannot be generated by SQL pre-generated results can be
used. The compare function sorts the output of each and calculates an md5
checksum to see if the results are the same.
Work remaining to be done is mainly in the area of deployment. Work needs to
be done to deploy a hadoop instance, construct a database, populate both with
the test data, and then run the tests.
> Pig needs a tool for doing end to end testing efficiently
> ---------------------------------------------------------
>
> Key: PIG-1899
> URL: https://issues.apache.org/jira/browse/PIG-1899
> Project: Pig
> Issue Type: Test
> Components: tools
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: e2e.patch
>
>
> Pig currently uses junit for all testing. junit is good for unit tests, but
> limited for end to end and integration testing.
> Building an end to end test in junit is cumbersome (a lot of setup and such
> to do using MiniCluster). Given that expected results must be known
> beforehand and hand crafted they must be kept very small, usually ten or less
> rows. This does not lead to realistic testing scenarios.
> A test tool is needed that allows the test developer to write a Pig Latin
> script and specify a source of truth against which to test the results of
> running this Pig Latin script. A database or a previous version of Pig can
> then be used as that source of truth. This will allow developers to quickly
> add new tests that return more than trivial results.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira