Here is my query :* select table1.column10, table1.column11, count(distinct table2.column3) as total from table1 JOIN table2 ON (table1.column1='value1' and to_date(table1.column2) >= '2011-01-06' and to_date(table1.column2) < '2011-01-07' and table2.column3!='' and table2.column3 is NOT NULL and table1.coulmn4 !='A' and table1.column5 != '' and table2.column3=table1.column3 and ((table1.coulmn6 in ('val1') and to_date(table2.column7)='2011-01-06') or (table1.coulmn6 in ('val2') and to_date(table2.column8)='2011-01-06') or (table1.coulmn6 in ('val3') and to_date(table2.column9)='2011-01-06'))) where group by table1.column10, table1.column11;*
Thanks, MIS. On Wed, Mar 23, 2011 at 12:54 PM, Ning Zhang <nzh...@fb.com> wrote: > What is your OR conditions? Are they involving both tables? Can you post > your query here? > > On Mar 23, 2011, at 12:04 AM, MIS wrote: > > > Ning, Thanks for the reply. > > Yes. you are right. Using NOT and AND didn't work as expected. > > I'll give a try in implementing nested-loop map-side join. > > > > In the meanwhile, I brought out the expression using OR from inside the > JOIN > > expression to be used in the filtering expression {in the WHERE clause }, > > but I see some difference in the results produced with what is expected. > > Since, I'm not using an OUTER join, I expected desired results. Any idea > > why is the OR expression in filtering not working as desired ? Any > thought > > on this are welcome. > > > > Thanks, > > MIS. > > > > > > On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang <nzh...@fb.com> wrote: > > > >> Joins with OR conditions are not supported by Hive currently. I think > even > >> though you rewrite the condition to use NOT and AND only, the results > may be > >> wrong. > >> > >> It is quite hard to implement joins of any tables with OR conditions in > a > >> MapReduce framework. it is straightforward to implement it in > nested-loop > >> join, but due to the nature of distributed processing, nested loop join > >> cannot be implemented in an efficient and scalable way in MapReduce. In > >> nested-loop join, each mapper need to join a split of LHS table with the > >> whole RHS table which could be terabytes. > >> > >> The regular (reduce-side) join in Hive is essentially a sort-merge join > >> operator. With that in mind, it's hard to implement OR conditions in the > >> sort-merge join. > >> > >> One exception is the map-side join, which assumes the RHS table is small > >> and will be read fully into each mapper. Currently map-side join in Hive > is > >> a hash-based join operator. You can implement a nested-loop map-side > join > >> operator to enable any join conditions including OR. > >> > >> On Mar 22, 2011, at 1:39 AM, MIS wrote: > >> > >>> Found it at *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* > line > >>> no. 1122 > >>> There is some concern mentioned that supporting OR would lead to data > >>> explosion. Is it discussed/documneted in a little more detail somewhere > ? > >> If > >>> so, some pointers towards the same will be helpful. > >>> > >>> Thanks, > >>> MIS. > >>> > >>> On Tue, Mar 22, 2011 at 1:19 PM, MIS <misapa...@gmail.com> wrote: > >>> > >>>> I want to use OR in the join expression, but it seems only AND is > >> supported > >>>> as of now. > >>>> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 > AND > >>>> !C2))} , but it would be nice if somebody can point me to the location > >> in > >>>> code base that would need modification to support the OR in the join > >>>> expression. > >>>> > >>>> Thanks, > >>>> MIS. > >>>> > >> > >> > >