There is another UDF called COUNT_STAR that counts nulls. This is a documented behavior of COUNT in that it ignores null.
http://pig.apache.org/docs/r0.11.1/func.html#count On Mon, Sep 16, 2013 at 12:22 PM, Daniel Dai <da...@hortonworks.com> wrote: > It is COUNT.java:105: > if (t != null && t.size() > 0 && t.get(0) != null) > > Seems we don't count tuple with first field null. Not sure why this happen > but I would think it a bug. > > Thanks, > Daniel > > > On Sun, Sep 15, 2013 at 8:40 PM, centerqi hu <cente...@gmail.com> wrote: > > > The sample.txt file content: > > > > android,u1,taobao1 > > android,u1,taobao1 > > ,u2,taobao2 > > > > RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',') > > as (platform, machineID, productID); > > RB = GROUP RR BY (productID); > > RES = FOREACH RB{ > > ITEMUV = DISTINCT RR.machineID; > > GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS > > PV; > > }; > > DUMP RES; > > > > OUTPUT: > > > > (taobao1,1,2) > > (taobao2,1,0) > > > > Why taobao2 the pv is 0, but uv is 1? > > > > I view? the source code of the COUNT function > > > > If the first column is null, cnt will not increase > > > > while (it.hasNext()){ > > Tuple t = (Tuple)it.next(); > > if (t != null && t.size() > 0 && t.get(0) != null ) > > cnt++; > > } > > > > -- > > cente...@gmail.com|齐忠 > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >