[
https://issues.apache.org/jira/browse/PIG-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856801#comment-13856801
]
Sergey commented on PIG-3638:
-----------------------------
I suppose that our approach won't fit current PigUnit ideoma.
We've used Groovy and we've wrapped
{code}
def pigServer = new PigServer(ExecType.LOCAL)
{code}
with our PigScriptTest class which feeds script to PigServer
{code}
pigServer.registerScript(new
FileInputStream(scriptFile.absolutePath), params, null)
for (ExecJob job : pigServer.executeBatch())
{
while (!job.hasCompleted())
{
TimeUnit.SECONDS.sleep(1)
}
if (job.status != ExecJob.JOB_STATUS.COMPLETED)
{
return PigExecutionResult.failed()
}
}
{code}
It's more data-driven test, than unit-test. The major advantage is that we can
use any Storage/Loader unitities in script and script can go to produnction
without any modification.
Typical Pig test looks this way:
{code}
class FilterEnrichXvlrEventsTest
{
@Test(groups = ['integration'])
public void test01()
{
def test =
pigScriptTest("filter_enrich_xvlr_events.pig", "test01")
.withInput("xvlr_data", [format: new
FormatMetadata(inputFormatType: FormatType.CSV,
outputFormatType:FormatType.SEQ,
keyClass:
NullWritable.class,
valueClass:
Text)]) //special converter from csv to SequenceFile. It's easier to manage
test data stored as CSV than binary seq file. We Use Twitter SeqenceFile
readers in this script
.withInput("lol", "lol.avro") //avro input for AvroStorage
.withOutput("out_lte") //several output STORE statements is
script
.withOutput("out")
def result = test.run()
assertThat(result, is(completed()))
assertThat(result, hasOutput("out").notContains("xxx"))
assertThat(result, hasOutput("out").contains("yyy"))
}
}
{code}
> Improve PigUnit
> ---------------
>
> Key: PIG-3638
> URL: https://issues.apache.org/jira/browse/PIG-3638
> Project: Pig
> Issue Type: New Feature
> Reporter: Sergey
>
> Hi. I have a suggestion to improve PigUnit
> 1. Add default functionality to feed several input to one script. I didn't
> find a way how to do it using exsiting API and had to extend it.
> 2. Allow to use "native" loaders. There are plenty of bug when you start to
> run in prod your script with AvroStorage/any other complacated storage. You
> can catch many schema/types related bugs on unit-test level.
> 3. The same for storage.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)