Ning, Thanks for the reply. Yes. you are right. Using NOT and AND didn't work as expected. I'll give a try in implementing nested-loop map-side join.
In the meanwhile, I brought out the expression using OR from inside the JOIN expression to be used in the filtering expression {in the WHERE clause }, but I see some difference in the results produced with what is expected. Since, I'm not using an OUTER join, I expected desired results. Any idea why is the OR expression in filtering not working as desired ? Any thought on this are welcome. Thanks, MIS. On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang <nzh...@fb.com> wrote: > Joins with OR conditions are not supported by Hive currently. I think even > though you rewrite the condition to use NOT and AND only, the results may be > wrong. > > It is quite hard to implement joins of any tables with OR conditions in a > MapReduce framework. it is straightforward to implement it in nested-loop > join, but due to the nature of distributed processing, nested loop join > cannot be implemented in an efficient and scalable way in MapReduce. In > nested-loop join, each mapper need to join a split of LHS table with the > whole RHS table which could be terabytes. > > The regular (reduce-side) join in Hive is essentially a sort-merge join > operator. With that in mind, it's hard to implement OR conditions in the > sort-merge join. > > One exception is the map-side join, which assumes the RHS table is small > and will be read fully into each mapper. Currently map-side join in Hive is > a hash-based join operator. You can implement a nested-loop map-side join > operator to enable any join conditions including OR. > > On Mar 22, 2011, at 1:39 AM, MIS wrote: > > > Found it at *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line > > no. 1122 > > There is some concern mentioned that supporting OR would lead to data > > explosion. Is it discussed/documneted in a little more detail somewhere ? > If > > so, some pointers towards the same will be helpful. > > > > Thanks, > > MIS. > > > > On Tue, Mar 22, 2011 at 1:19 PM, MIS <misapa...@gmail.com> wrote: > > > >> I want to use OR in the join expression, but it seems only AND is > supported > >> as of now. > >> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND > >> !C2))} , but it would be nice if somebody can point me to the location > in > >> code base that would need modification to support the OR in the join > >> expression. > >> > >> Thanks, > >> MIS. > >> > >