What is your OR conditions? Are they involving both tables? Can you post your query here?
On Mar 23, 2011, at 12:04 AM, MIS wrote: > Ning, Thanks for the reply. > Yes. you are right. Using NOT and AND didn't work as expected. > I'll give a try in implementing nested-loop map-side join. > > In the meanwhile, I brought out the expression using OR from inside the JOIN > expression to be used in the filtering expression {in the WHERE clause }, > but I see some difference in the results produced with what is expected. > Since, I'm not using an OUTER join, I expected desired results. Any idea > why is the OR expression in filtering not working as desired ? Any thought > on this are welcome. > > Thanks, > MIS. > > > On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang <nzh...@fb.com> wrote: > >> Joins with OR conditions are not supported by Hive currently. I think even >> though you rewrite the condition to use NOT and AND only, the results may be >> wrong. >> >> It is quite hard to implement joins of any tables with OR conditions in a >> MapReduce framework. it is straightforward to implement it in nested-loop >> join, but due to the nature of distributed processing, nested loop join >> cannot be implemented in an efficient and scalable way in MapReduce. In >> nested-loop join, each mapper need to join a split of LHS table with the >> whole RHS table which could be terabytes. >> >> The regular (reduce-side) join in Hive is essentially a sort-merge join >> operator. With that in mind, it's hard to implement OR conditions in the >> sort-merge join. >> >> One exception is the map-side join, which assumes the RHS table is small >> and will be read fully into each mapper. Currently map-side join in Hive is >> a hash-based join operator. You can implement a nested-loop map-side join >> operator to enable any join conditions including OR. >> >> On Mar 22, 2011, at 1:39 AM, MIS wrote: >> >>> Found it at *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line >>> no. 1122 >>> There is some concern mentioned that supporting OR would lead to data >>> explosion. Is it discussed/documneted in a little more detail somewhere ? >> If >>> so, some pointers towards the same will be helpful. >>> >>> Thanks, >>> MIS. >>> >>> On Tue, Mar 22, 2011 at 1:19 PM, MIS <misapa...@gmail.com> wrote: >>> >>>> I want to use OR in the join expression, but it seems only AND is >> supported >>>> as of now. >>>> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND >>>> !C2))} , but it would be nice if somebody can point me to the location >> in >>>> code base that would need modification to support the OR in the join >>>> expression. >>>> >>>> Thanks, >>>> MIS. >>>> >> >>