RE: New to hive... slow query performance

Sharma, Raghvendra Sun, 26 Sep 2010 21:22:47 -0700

Thanks for the responses.

Perhaps I am trying something different here. Or may be looking at a unsuitable 
product for my requirement, though that's the objective of my little test.


I need to load a few million rows every day into a database. And it's not a 
file based system, I have comma delimited rows (of columns) which would exactly 
fit a relational database.

After the loading, I need to allow a very fast search mechanism. Looking a bit 
at Google's implementation of bigtable and structure around it, I originally 
thought of using hive integrated with hbase. Hive because of its querying 
capabilities.  The loading works out fine, better than RDBMS perf. However, the 
querying bottleneck, which was the reason to look for alternatives to RDBMS in 
the first place, continues with hive too.

Now, interpreting the responses to my original question, I feel that hive might 
not be the answer to my requirements of a fast querying engine to the db.

Is there something else ? any other tool/solution/library that I can put on top 
of hbase ? or even without hbase ? (I looked at hbase as an alternative to the 
RDBMS, moving towards dist computing)

Suggestions please...

--raghav..

From: wd [mailto:w...@wdicc.com]
Sent: Thursday, September 23, 2010 7:41 PM
To: hive-user@hadoop.apache.org
Subject: Re: New to hive... slow query performance

Hi,

I think hive and hadoop just provide a way to easy scale up your data 
analyzing, it's will not fast than any db in single node. If your data is not 
large enough, for example 1GB per day, you should not use it I think.
2010/9/23 Sharma, Raghvendra 
<sraghven...@corelogic.com<mailto:sraghven...@corelogic.com>>
Hi,

I am very new to hive, have just been able to load some data into it.

I am running hadoop on a old Pentium 4 box with 4 gb RAM.
It's a single node cluster, and configured based on tutorials from apache site 
and others.

The load speeds to hdfs look ok, I am able to load approx 20 million rows in 
around 2 minutes.
However, the querying is pathetic. It takes minutes to come back with a single 
where clause, a simple count(*) sends it to sleep. A join is in terms of hours.
Since I am just starting, I am not using anything fancy, as in clustering or 
partitioning of the table. (Is that a wrong choice ? I thought I'd start simple)

Somehow I have a feeling that it would be something to do with the wrong kind 
of configuration.

Can someone help me..

PS : Are there no "forums" for hadoop/hive ?? I couldn't find any. :(

--raghav


******************************************************************************************

This message may contain confidential or proprietary information intended only 
for the use of the

addressee(s) named above or may contain information that is legally privileged. 
If you are

not the intended addressee, or the person responsible for delivering it to the 
intended addressee,

you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly

prohibited. If you have received this message by mistake, please immediately 
notify us by

replying to the message and delete the original message and any copies 
immediately thereafter.



Thank you.

******************************************************************************************

CLLD

RE: New to hive... slow query performance

Reply via email to