Yes. I agree. 
Moving work should be done by developers to users is always not
user-friendly. :)
For star-schemes, people can easily find out which table is the small table.
Joining with more than one big table is always risky.

But before a cost optimizer, it seems we have no other choice.

Thanks
Yongqiang
On 2/19/10 11:35 AM, "Edward Capriolo" <[email protected]> wrote:

> On Fri, Feb 19, 2010 at 2:25 PM, Edward Capriolo <[email protected]>
> wrote:
>> On Fri, Feb 19, 2010 at 12:35 AM, Yongqiang He
>> <[email protected]> wrote:
>>> Hi Edward,
>>> You can do it with streamtable hint. Hive will put the table in that hint in
>>> the rightmost.
>>> 
>>> -yongqiang
>>> On 2/18/10 3:21 PM, "Edward Capriolo" <[email protected]> wrote:
>>> 
>>>> I have worked through this issue.
>>>> 
>>>> * When doing Join, please put the table with big number of rows
>>>> containing the same join key to
>>>> the rightmost in the JOIN clause. Otherwise we may see OutOfMemory errors.
>>>> 
>>>> This advice does work, but should we open up a jira to create a simple
>>>> optimizer that does this?
>>>> 
>>>> Edward
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> I do not understand the hint. A user can re-write the query can't they?
>> 
>> select a join b
>> select b join a
>> 
>> What I am asking, should we add an optimizer that uses does heuristics
>> on the tables and automatically streams the smaller/larger?
>> 
> 
> The reason I am mentioning this is I am training hive users right now.
> You can imagine that the first three table join someone did caused an
> OOM. I explained to them roughly how a hive join works and how you
> should move the largest table to one side. They understood but
> replied, "Sounds like something an optimizer could handle."
> 
> Even joining two tables it is a pain to ask someone to find which
> table is larger. Imagine joining 10 or so. Also user perception, image
> your first join throwing in OOM.
> 
> 


Reply via email to