Thanks Thejas! https://issues.apache.org/jira/browse/PIG-2668
On Wed, Apr 25, 2012 at 2:04 PM, Thejas Nair <[email protected]> wrote: > yes, please create one. > Thanks, > Thejas > > > On 4/25/12 1:47 PM, Aniket Mokashi wrote: > >> Hi Dmitriy and Thejas, >> >> Should I open a jira for the same? >> >> Thanks, >> Aniket >> >> >> On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[email protected] >> <mailto:[email protected]>> wrote: >> >> Yeah I think we just need to get projection pushdown to work through >> Split operators. >> >> D >> >> On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair >> <[email protected] <mailto:[email protected]**>> wrote: >> > cc'ing dev@pig as this is a pig issue. >> > >> > Aniket, What you saw is not related to PIG-2339 . >> > >> > In your example query, the logical plan will look like this - >> > >> > Load (A) >> > | >> > Split >> > | >> > --------------------------- >> > | | >> > Filter(B1) Filter(B2) ... >> > >> > Because of the split operator introduced between the filter >> conditions and >> > load, the filter does not get pushed into the load function. >> > >> > A simple way to fix this in pig would be to not share the load >> across the >> > filter operators. Another option is to push the condition (B1 or >> B2 or B3) >> > into Load operator and retain rest of the current plan (split and >> filters >> > following the split). >> > >> > You can ofcourse achieve the same effect by having a separate load >> > statememnt as input for each of the filters. >> > >> > I agree that we should make it possible to ask pig to throw a >> warning/error >> > if the query is going to result in a full table scan on a >> partitioned table. >> > >> > Thanks, >> > Thejas >> > >> > >> > >> > >> > On 4/24/12 7:56 PM, Aniket Mokashi wrote: >> >> >> >> Sorry Thejas, I didnt look into the jira properly earlier. >> >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I >> did not >> >> hit that issue earlier (and I patched datanucleus). filter-union >> was a >> >> workaround I was using to avoid some of the thrift timeout problems >> >> earlier. Thrift api's timeout on client side in 20sec by default (I >> >> found the config to change this later) and I hence used a = load >> >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, >> b2..; >> >> to expect to push these filters separately to the loader. But, that >> >> doesn't work in pig. (I can open a jira, but I havent done enough >> >> investigation at the code level). Thoughts? >> >> >> >> Thanks, >> >> Aniket >> >> >> >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair >> <[email protected] <mailto:[email protected]**> >> >> <mailto:[email protected] <mailto:[email protected]**>>> >> >> wrote: >> >> >> >> The issue was not specific to filter-union >> >> - >> https://issues.apache.org/__**jira/browse/PIG-2339<https://issues.apache.org/__jira/browse/PIG-2339> >> >> >> <https://issues.apache.org/**jira/browse/PIG-2339<https://issues.apache.org/jira/browse/PIG-2339> >> >. >> >> The fix was to do filter PushUpFilter before >> PartitionFilterOptimizer . >> >> >> >> As this is not a hcat issue, it should not matter if you have an >> >> older hcat version . fyi, this bug was not there in pig 0.8.x . >> >> Was it pig 0.9.0 or 0.9.1 that you used ? >> >> >> >> Thanks, >> >> Thejas >> >> >> >> >> >> >> >> On 4/24/12 5:21 PM, Aniket Mokashi wrote: >> >> >> >> Hi Thejas, >> >> >> >> Can you point me to jira that fixes filter-union problem >> (in pig)? >> >> I >> >> haven't tried hcat-0.4 yet, good to know about that issue. I >> >> will keep a >> >> watcher. >> >> >> >> Thanks, >> >> Aniket >> >> >> >> On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair >> >> <[email protected] <mailto:[email protected]**> >> <mailto:[email protected] <mailto:[email protected]**>> >> >> <mailto:[email protected] <mailto:[email protected]**> >> >> <mailto:[email protected] >> <mailto:[email protected]**>>__>> wrote: >> >> >> >> Hi Aniket, >> >> Are you using pig 0.9 or 0.9.1 ? >> >> If yes, can you try with pig 0.9.2 ? >> >> Wondering if you are also hitting the issue that Thomas >> >> mentioned . >> >> >> >> Thanks, >> >> Thejas >> >> >> >> >> >> >> >> >> >> On 4/23/12 7:39 PM, Aniket Mokashi wrote: >> >> >> >> Something similar I have noticed is - >> >> >> >> A = load ... >> >> B1 = filter A by cond1; >> >> B2 = filter A by cond2; >> >> B3 = filter A by cond3; >> >> >> >> B = union B1, B2, B3; does not push projection. >> >> >> >> Is that expected? >> >> >> >> Ideally, we should have "strict" mode under >> hcatalog, >> >> that when >> >> turned >> >> on will avoid executing pig queries on the full >> >> (partitioned) table. >> >> >> >> Thanks, >> >> Aniket >> >> >> >> On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan >> >> <[email protected] >> <mailto:rajesh.balamohan@**gmail.com<[email protected]> >> > >> <mailto:rajesh.balamohan@**gmail.com <[email protected]><mailto: >> rajesh.balamohan@**gmail.com <[email protected]>>> >> >> <mailto:rajesh.balamohan@ >> <mailto:rajesh.balamohan@>__gm**ail.com<http://gmail.com> >> <http://gmail.com> >> >> <mailto:rajesh.balamohan@**gmail.com <[email protected]> >> >> <mailto:rajesh.balamohan@**gmail.com <[email protected]>>>> >> >> <mailto:rajesh.balamohan@ <mailto:rajesh.balamohan@> >> >> <mailto:rajesh.balamohan@ >> <mailto:rajesh.balamohan@>>__g**ma__il.com <http://gma__il.com> < >> http://gma__il.com> >> <http://gmail.com> >> >> >> >> <mailto:rajesh.balamohan@ >> <mailto:rajesh.balamohan@>__gm**ail.com<http://gmail.com> >> <http://gmail.com> >> >> <mailto:rajesh.balamohan@**gmail.com <[email protected]> >> >> <mailto:rajesh.balamohan@**gmail.com <[email protected]>>>>>> >> wrote: >> >> >> >> Hi Alan, >> >> >> >> Thanks for the quick response. >> >> >> >> I am using HCatalog 0.4. >> >> >> >> With simple PIG script it works great. HCatalog >> >> beautifully >> >> scans >> >> only the relevant information. However, full >> scan >> >> happens >> >> only when >> >> we have couple of additional joins and when we >> >> change the >> >> INNER JOIN >> >> order (we also use "using skewed"). >> >> >> >> Though we have looked into the debug logs, we >> saw the >> >> scanning of >> >> number of records from the JobTracker's counters >> >> itself. Without >> >> pruning, the m/r job was pretty much scanning >> the >> >> entire set >> >> of rows. >> >> >> >> I am not sure if there is a corner case, where >> in >> >> "skewed" >> >> join is >> >> trying to override the filtering. >> >> >> >> ~Rajesh.B >> >> >> >> >> >> >> >> On Tue, Apr 24, 2012 at 2:13 AM, Alan Gates >> >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>**> >> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>**>> >> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>**> >> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>**>>__>__> >> >> >> >> wrote: >> >> >> >> What version of HCatalog are you using? >> How do >> >> you know >> >> it is >> >> scanning all the partitions, does it say >> so in >> >> the logs, >> >> or are >> >> you getting all the records back? >> >> >> >> And yes, HCat is supposed to do partition >> >> pruning so that it >> >> only scans the required partitions. >> >> >> >> Alan. >> >> >> >> On Apr 21, 2012, at 8:27 PM, Rajesh >> Balamohan >> >> wrote: >> >> >> >> > Hi All, >> >> > >> >> > I have a hcatalog table "partitioned by (d string)". >> >> > >> >> > I have couple of days worth of data and when i run "show >> >> partitions" it provides the correct daa. >> >> > >> >> > d=20111215 >> >> > d=20111216 >> >> > d=20111217 >> >> > d=20111218 >> >> > d=20111219 >> >> > d=20111220 >> >> > d=20111221 >> >> > d=20111222 >> >> > d=20111223 >> >> > d=20111224 >> >> > d=20111225 >> >> > d=20120415 >> >> > >> >> > However, when I run PIG with "filter a by d == '20120415'", >> >> it ends up scanning all data. >> >> > >> >> > Is this a known bug/enhancement in HCatalog?. Ideally, >> >> shouldn't it scan only the d=20120415 >> directory? >> >> > >> >> > Any pointers would be of great help. >> >> > >> >> > >> >> > -- >> >> > ~Rajesh.B >> >> >> >> >> >> >> >> >> >> -- >> >> ~Rajesh.B >> >> >> >> >> >> >> >> >> >> -- >> >> "...:::Aniket:::... Quetzalco@tl" >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> "...:::Aniket:::... Quetzalco@tl" >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> "...:::Aniket:::... Quetzalco@tl" >> > >> > >> >> >> >> >> -- >> "...:::Aniket:::... Quetzalco@tl" >> > > -- "...:::Aniket:::... Quetzalco@tl"
