Re: How HIVE manages a join

akshaya iyengar Thu, 12 Aug 2010 14:05:45 -0700

Hello,
I apologize if this is out of context for the current thread. I was looking
for the Hive architecture diagram on this page
http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work
for me as well.


It would be a great help if someone could direct me to this information.

Thanks,
Akshaya

On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <[email protected]>wrote:

> Joydeep,
>
> I am sorry. I put that when I thought we were going to actively move
> to xdocs. You an remove that if you like.
>
> As i said in a thread before the problem with the wiki is that no one
> actively updates it. Example:
>
> http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
> oopse: Really what about "in support"...
> https://issues.apache.org/jira/browse/HIVE-801
>
> Which is why I hold the option that all patches except bug fixes
> should probably come with xdocs, People are free to disagree.
>
> Edward
>
> On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <[email protected]>
> wrote:
> > i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT
> EDIT!Join Syntax'
> >
> > why must edits to the wiki be banned if there are xdocs? hadoop has both.
> >
> > there will always be things that are not captured in xdocs. it's pretty
> sad to discourage free form edits by people who want to contribute without
> checking out source. (what is this - the 80s?)
> > ________________________________________
> > From: Edward Capriolo [[email protected]]
> > Sent: Tuesday, August 10, 2010 2:57 PM
> > To: [email protected]
> > Cc: [email protected]
> > Subject: Re: How HIVE manages a join
> >
> > Sorry.
> > $hive_root/docs/xdocs/language_manual/joins.xml
> >
> > On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <[email protected]>
> wrote:
> >> This page is is already in version control..
> >>
> >> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
> >>
> >> Edward
> >>
> >> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <[email protected]>
> wrote:
> >>> Hi Yongqiang,
> >>> Please go ahead and update the wiki page. I will copy it over to
> version
> >>> control when you are done.
> >>> Thanks.
> >>> Carl
> >>>
> >>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> In the Hive Join wiki page, it says
> >>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
> >>>>
> >>>> Where should i do the update?
> >>>>
> >>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <
> [email protected]>
> >>>> wrote:
> >>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
> >>>> > and seems stable now. I did one skew join but haven't get a chance
> to
> >>>> > look at another skew join Namit mentioned to me. But definitely
> should
> >>>> > update the wiki earlier. My bad.
> >>>> >
> >>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <
> [email protected]>
> >>>> > wrote:
> >>>> >> Yongqiang mentioned he was going to update the wiki with this
> >>>> >> information in
> >>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
> >>>> >>
> >>>> >> Yongqiang, have you gotten a chance to complete the sort merge
> bucket
> >>>> >> map
> >>>> >> join and the other skew join you mention in the above thread?
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Jeff
> >>>> >>
> >>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> >>>> >> <[email protected]> wrote:
> >>>> >>>
> >>>> >>> Roberto ..
> >>>> >>>
> >>>> >>> You can find these links useful ..
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
> >>>> >>> - Simple joins and optimizations..
> >>>> >>>
> >>>> >>>
> >>>> >>>
> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
> >>>> >>> New kind of joins / features of hive ..
> >>>> >>>
> >>>> >>> Thanks
> >>>> >>>
> >>>> >>> Bharath.V
> >>>> >>> 4th year Undergraduate..
> >>>> >>> IIIT Hyderabad
> >>>> >>>
> >>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
> >>>> >>> <[email protected]> wrote:
> >>>> >>>>
> >>>> >>>> Hi,
> >>>> >>>>
> >>>> >>>> I cannot find any documentation about what algorithm performs
> HIVE to
> >>>> >>>> translate JOIN clauses to Map-Reduce tasks.
> >>>> >>>>
> >>>> >>>> In particular, if I have two tables A and B, each table is
> written on
> >>>> >>>> a
> >>>> >>>> separate file and each file is splitted on hadoop nodes. When I
> >>>> >>>> perform a
> >>>> >>>> JOIN with A.column = B.column, the framework has to compare full
> data
> >>>> >>>> from
> >>>> >>>> the first file and full data from the second file. In order to
> >>>> >>>> perform a
> >>>> >>>> full scan of all possibile combinations of values, how can hadoop
> >>>> >>>> perform
> >>>> >>>> it? If each node contains a portion of each file, it seems not
> >>>> >>>> possible to
> >>>> >>>> have a complete comparison. Does one of the two files enterely
> >>>> >>>> replicated on
> >>>> >>>> each node? Or, does HIVE use another kind of
> strategy/optimization?
> >>>> >>>>
> >>>> >>>> Thanks.
> >>>> >>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: How HIVE manages a join

Reply via email to