[ 
https://issues.apache.org/jira/browse/PIG-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856801#comment-13856801
 ] 

Sergey commented on PIG-3638:
-----------------------------

I suppose that our approach won't fit current  PigUnit ideoma.
We've used Groovy and we've wrapped
{code}
def pigServer = new PigServer(ExecType.LOCAL)
{code}
with our PigScriptTest class which feeds script to  PigServer 
{code}

            pigServer.registerScript(new 
FileInputStream(scriptFile.absolutePath), params, null)

            for (ExecJob job : pigServer.executeBatch())
            {
                while (!job.hasCompleted())
                {
                    TimeUnit.SECONDS.sleep(1)
                }

                if (job.status != ExecJob.JOB_STATUS.COMPLETED)
                {
                    return PigExecutionResult.failed()
                }
            }
{code}

It's more data-driven test, than unit-test. The major advantage is that we can 
use any Storage/Loader unitities in script and script can go to produnction 
without any modification.

Typical Pig test looks this way:
{code}
class FilterEnrichXvlrEventsTest
{

    @Test(groups = ['integration'])
    public void test01()
    {
        def test =
            pigScriptTest("filter_enrich_xvlr_events.pig", "test01")
                .withInput("xvlr_data", [format: new 
FormatMetadata(inputFormatType: FormatType.CSV,
                                                           
outputFormatType:FormatType.SEQ,
                                                           keyClass:        
NullWritable.class,
                                                           valueClass:      
Text)]) //special converter from csv to SequenceFile. It's easier to manage 
test data stored as CSV than binary seq file. We Use Twitter SeqenceFile 
readers in this script
                .withInput("lol", "lol.avro") //avro input for AvroStorage
                .withOutput("out_lte") //several output STORE statements is 
script
                .withOutput("out")

        def result = test.run()

        assertThat(result, is(completed()))

        assertThat(result, hasOutput("out").notContains("xxx"))
        assertThat(result, hasOutput("out").contains("yyy"))

    }
}
{code}


> Improve PigUnit
> ---------------
>
>                 Key: PIG-3638
>                 URL: https://issues.apache.org/jira/browse/PIG-3638
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Sergey
>
> Hi. I have a suggestion to improve PigUnit
> 1. Add default functionality to feed several input to one script. I didn't 
> find a way how to do it using exsiting API and had to extend it.
> 2. Allow to use "native" loaders. There are plenty of bug when you start to 
> run in prod your script with AvroStorage/any other complacated storage. You 
> can catch many schema/types related bugs on unit-test level.
> 3. The same for storage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to