On Tue, Sep 15, 2009 at 10:28 AM, Jeff Hammerbacher <[email protected]> wrote: > Hey Ted, > I don't want to derail this thread, but I would like to correct any > misperceptions which may exist in the community. > > 1) HiveQL intends to include SQL as a subset of its syntax: see the VLDB > paper for more ( > http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009). As it > stands today, a reasonable subset of SQL is already supported, and most > users of MySQL, Oracle, or PostgreSQL will be able to work comfortably in > Hive today. > > 2) There's a patch for SQL support in Pig: > http://issues.apache.org/jira/browse/PIG-824. > > Every database implements a different dialect of SQL (e.g. express a Top K > query in your favorite database and compare to the rest), and the Pig and > HiveQL dialects are as valid as any other. If you disagree, I'd love to hear > your perspective on why these languages are "not SQL". > > Regards, > Jeff > > On Tue, Sep 15, 2009 at 12:42 AM, Ted Dunning <[email protected]> wrote: > >> uhhh... neither pig nor hive are really SQL. Higher level of abstraction >> than pure MR, but not SQL. >> >> You are right to include Greenplum, though. They slipped my mind, probably >> because they don't have a google ad running everything 30 seconds like >> Aster >> does. >> >> On Mon, Sep 14, 2009 at 11:21 PM, Jeff Hammerbacher <[email protected] >> >wrote: >> >> > > >> > > Do you want to tightly integrate SQL and map-reduce? Asterdata has a >> > > product that might help you. >> > > >> > >> > As does Greenplum. You could also get this functionality from Pig or >> Hive, >> > which are Apache 2.0-licensed subprojects of Hadoop. >> >> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> >
I notice we have mentioned greenplum and aster. I will speak to the fact that I have never used either product, but I have spoken to some sales reps over the years who are very helpful, I might add. * caveat: I am not saying that my price information is accurate or current But the major deal breaker at my old places of employment was always cost. Per TB pricing was a major deal breaker for US. We wanted to keep our data indefinitely but most reporting is month-over-month. So having to keep all our data (that we don't really use that much after two months) in a system that charges by TB was expensive and would become more expensive as our data set grows. In the solution space you get a lot of bank for your buck (hadoop+hive) vs (TeraData, GreenPlum, Aster), as you know the price of Hadoop+hive (0+0) plus hardware. Hive is not 100% SQL, but I would say join the Hive user list and be amazed. New types of joins, theta-join, etc have been added by user request. Most of the time if you can't do something you would expect to do in SQL there is a work around. The flip side is true as well, Hive has specific support that other databases don't :)
