Re: HadoopDB and similar stuff

Edward Capriolo Tue, 15 Sep 2009 08:35:30 -0700

On Tue, Sep 15, 2009 at 10:28 AM, Jeff Hammerbacher <[email protected]> wrote:
> Hey Ted,
> I don't want to derail this thread, but I would like to correct any
> misperceptions which may exist in the community.
>
> 1) HiveQL intends to include SQL as a subset of its syntax: see the VLDB
> paper for more (
> http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009). As it
> stands today, a reasonable subset of SQL is already supported, and most
> users of MySQL, Oracle, or PostgreSQL will be able to work comfortably in
> Hive today.
>
> 2) There's a patch for SQL support in Pig:
> http://issues.apache.org/jira/browse/PIG-824.
>
> Every database implements a different dialect of SQL (e.g. express a Top K
> query in your favorite database and compare to the rest), and the Pig and
> HiveQL dialects are as valid as any other. If you disagree, I'd love to hear
> your perspective on why these languages are "not SQL".
>
> Regards,
> Jeff
>
> On Tue, Sep 15, 2009 at 12:42 AM, Ted Dunning <[email protected]> wrote:
>
>> uhhh... neither pig nor hive are really SQL.  Higher level of abstraction
>> than pure MR, but not SQL.
>>
>> You are right to include Greenplum, though.  They slipped my mind, probably
>> because they don't have a google ad running everything 30 seconds like
>> Aster
>> does.
>>
>> On Mon, Sep 14, 2009 at 11:21 PM, Jeff Hammerbacher <[email protected]
>> >wrote:
>>
>> > >
>> > > Do you want to tightly integrate SQL and map-reduce?  Asterdata has a
>> > > product that might help you.
>> > >
>> >
>> > As does Greenplum. You could also get this functionality from Pig or
>> Hive,
>> > which are Apache 2.0-licensed subprojects of Hadoop.
>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>


I notice we have mentioned greenplum and aster. I will speak to the
fact that I have never used either product, but I have spoken to some
sales reps over the years who are very helpful, I might add.

* caveat: I am not saying that my price information is accurate or current

But the major deal breaker at my old places of employment was always
cost. Per TB pricing was a major deal breaker for US. We wanted to
keep our data indefinitely but most reporting is month-over-month. So
having to keep all our data (that we don't really use that much after
two months) in a system that charges by TB was expensive and would
become more expensive as our data set grows.

In the solution space you get a lot of bank for your buck
(hadoop+hive) vs (TeraData, GreenPlum, Aster), as you know the price
of Hadoop+hive (0+0) plus hardware.

Hive is not 100% SQL, but I would say join the Hive user list and be
amazed. New types of joins, theta-join, etc have been added by user
request. Most of the time if you can't do something you would expect
to do in SQL there is a work around.

The flip side is true as well, Hive has specific support that other
databases don't :)

Re: HadoopDB and similar stuff

Reply via email to