Here is my query :*
select table1.column10, table1.column11, count(distinct table2.column3) as
total from table1 JOIN table2 ON (table1.column1='value1' and
to_date(table1.column2) >= '2011-01-06'  and to_date(table1.column2) <
'2011-01-07' and  table2.column3!='' and table2.column3 is NOT NULL and
table1.coulmn4 !='A' and table1.column5 != '' and
table2.column3=table1.column3 and ((table1.coulmn6 in ('val1') and
to_date(table2.column7)='2011-01-06') or (table1.coulmn6 in ('val2') and
to_date(table2.column8)='2011-01-06') or (table1.coulmn6 in ('val3') and
to_date(table2.column9)='2011-01-06'))) where group by table1.column10,
table1.column11;*

Thanks,
MIS.


On Wed, Mar 23, 2011 at 12:54 PM, Ning Zhang <nzh...@fb.com> wrote:

> What is your OR conditions? Are they involving both tables? Can you post
> your query here?
>
> On Mar 23, 2011, at 12:04 AM, MIS wrote:
>
> > Ning, Thanks for the reply.
> > Yes. you are right. Using NOT and AND didn't work as expected.
> > I'll give a try in implementing nested-loop map-side join.
> >
> > In the meanwhile, I brought out the expression using OR from inside the
> JOIN
> > expression to be used in the filtering expression {in the WHERE clause },
> > but I see some difference in the results produced with what is expected.
> > Since,  I'm not using an OUTER join, I expected desired results. Any idea
> > why is the OR expression in filtering not working as desired ? Any
> thought
> > on this are welcome.
> >
> > Thanks,
> > MIS.
> >
> >
> > On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang <nzh...@fb.com> wrote:
> >
> >> Joins with OR conditions are not supported by Hive currently. I think
> even
> >> though you rewrite the condition to use NOT and AND only, the results
> may be
> >> wrong.
> >>
> >> It is quite hard to implement joins of any tables with OR conditions in
> a
> >> MapReduce framework. it is straightforward to implement it in
> nested-loop
> >> join, but due to the nature of distributed processing, nested loop join
> >> cannot be implemented in an efficient and scalable way in MapReduce. In
> >> nested-loop join, each mapper need to join a split of LHS table with the
> >> whole RHS table which could be terabytes.
> >>
> >> The regular (reduce-side) join in Hive is essentially a sort-merge join
> >> operator. With that in mind, it's hard to implement OR conditions in the
> >> sort-merge join.
> >>
> >> One exception is the map-side join, which assumes the RHS table is small
> >> and will be read fully into each mapper. Currently map-side join in Hive
> is
> >> a hash-based join operator. You can implement a nested-loop map-side
> join
> >> operator to enable any join conditions including OR.
> >>
> >> On Mar 22, 2011, at 1:39 AM, MIS wrote:
> >>
> >>> Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java*
> line
> >>> no. 1122
> >>> There is some concern mentioned that supporting OR would lead to data
> >>> explosion. Is it discussed/documneted in a little more detail somewhere
> ?
> >> If
> >>> so, some pointers towards the same will be helpful.
> >>>
> >>> Thanks,
> >>> MIS.
> >>>
> >>> On Tue, Mar 22, 2011 at 1:19 PM, MIS <misapa...@gmail.com> wrote:
> >>>
> >>>> I want to use OR in the join expression, but it seems only AND is
> >> supported
> >>>> as of now.
> >>>> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1
> AND
> >>>> !C2))} , but it would be nice if somebody can point me to the location
> >> in
> >>>> code base that would need modification to support the OR in the join
> >>>> expression.
> >>>>
> >>>> Thanks,
> >>>> MIS.
> >>>>
> >>
> >>
>
>

Reply via email to