Re: HadoopDB and similar stuff

Ted Dunning Tue, 15 Sep 2009 09:20:27 -0700

On Tue, Sep 15, 2009 at 7:28 AM, Jeff Hammerbacher <[email protected]>wrote:


> ... I would like to correct any
> misperceptions which may exist in the community.
>
> 1) HiveQL intends to include SQL as a subset of its syntax: see the VLDB
> paper for more (
> http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009). As it
> stands today, a reasonable subset of SQL is already supported, and most
> users of MySQL, Oracle, or PostgreSQL will be able to work comfortably in
> Hive today.
>

Note the key word "intends".  That indicates future tense.

As you say, it is a reasonable subset.  I don't know the I am sure that
there are wide swaths of SQL semantics that are not implemented.
Transactions, rollback, fancy outer joins, exactly correct syntax for null,
row updates and deletions are areas that I would expect deficiencies
relative to SQL.  Conversely, I doubt that there are many Hive programs that
could run without major alterations on conventional SQL engines.

The result is that HiveQL != SQL.  It is more correct to say HiveQL =kinda=
SQL.

2) There's a patch for SQL support in Pig:
> http://issues.apache.org/jira/browse/PIG-824.
>

More future tense.  This is hardly part of Pig at this point.  I expect that
this will come closer to SQL than the current HiveQL, but it is likely to
not have key semantic properties due to the properties of the substrate and
also have some important additions.

Every database implements a different dialect of SQL (e.g. express a Top K
> query in your favorite database and compare to the rest), and the Pig and
> HiveQL dialects are as valid as any other.


This level of cultural relativism is a bit disingenuous.  My point is that
you are setting up unreasonable expectations.  MR based systems are
inherently very different from traditional databases (which is, of course,
the POINT of having MR).  SQL is very strongly tied to the underlying row
update and transactional semantics of traditional databases.

I am NOT saying that Hive and Pig are not useful.  For many things, I prefer
them to SQL-based systems.  I am just saying that they are different
animals.

I am also NOT saying that Hive and Pig aren't a good way for SQL based
programmers to transition to map-reduce.  I am just saying that you should
tell people that Hive and Pig are similar to SQL so you don't have their
heads explode when they realize that it isn't really SQL.

Remember that many, many people claimed that myIsam tables are not really
SQL.  Hive is a darned site further from SQL than that.

Re: HadoopDB and similar stuff

Reply via email to