Also in continuation to our talk last week, it seems we are doing something
inefficient in MR. I believe that should be a separate issue other than
STATUS_NULL, it is more than a semantic cleanup (actually semantic is clear
in existing code, just need some cleanup).

Currently what Pig does is:
1. pull one record from POPackage/POJoinPackage
2. attach the record to pipeline
3. pull the leaf of pipeline until EOP
4. return to #1

Seems to have a EOP to go through the entire pipeline for each
POPackage/POJoinPackage output is unnecessary. It can be optimized. It
would be interesting to explore the opportunity of the optimization here.

Thanks,
Daniel


On Tue, Nov 12, 2013 at 2:38 PM, Daniel Dai <[email protected]> wrote:

> We shall definitely clear the ambiguity. +1 for #1.
>
> For #2, I did some digging into POPackage. What I see is in POPackage we
> produce STATUS_NULL, but not POJoinPackage. And the return value for
> POPackage is not used:
> if (pack instanceof POJoinPackage) {
>     while (true)
>     {
>         ....
>         if (processOnePackageOutput(context))
>             break;
>     }
> }
> else {
>      ....
>      processOnePackageOutput(context);
> }
>
> So it seems STATUS_NULL is not effectively in use in
> POPackage/POJoinPackage. We shall always use STATUS_EOP to indicate the end
> of pull (for both POPackage and pipeline). We shall not use STATUS_NULL
> anywhere once we make #1 clear. Am I right?
>
> Thanks,
> Daniel
>
>
> On Tue, Nov 12, 2013 at 8:35 AM, Cheolsoo Park <[email protected]>wrote:
>
>> >> I'll post a patch that makes the second option official and changes all
>> the examples of the first to return null with STATUS_OK (this is what
>> happens with the constant null currently).
>>
>> I am +1 for this.
>>
>>
>> On Fri, Nov 8, 2013 at 10:16 PM, Mark Wagner <[email protected]
>> >wrote:
>>
>> > Hi Devs,
>> >
>> > While discussing the Pig on Tez project today, we were unable to reach
>> > a conclusion on what the semantics of what POStatus.STATUS_NULL are
>> > and would like to ask those who remember the history of Pig to chime
>> > in. The two interpretations are:
>> >
>> > 1. POStatus.STATUS_NULL indicates that the pulled output IS null. This
>> > is used in some places like EqualToExpr.
>> >
>> > 2. POStatus.STATUS_NULL indicates that the pull did not produce any
>> > output. This is backed up by it's usage in POPackage for flattening an
>> > empty bag, and PigGenericMapBase where pulls on the operator pipeline
>> > that result in STATUS_NULL are discarded.
>> >
>> > If nobody has any concrete documentation or explanation for the
>> > discrepancy, I'll post a patch that makes the second option official
>> > and changes all the examples of the first to return null with
>> > STATUS_OK (this is what happens with the constant null currently).
>> >
>> > Thanks,
>> > Mark
>> >
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to