Thanks Amr, Without knowing the details of Hive, one constraint of SQL model is you can never generate more than one records from a single record. I don't know how this is done in Hive. Another question is whether the Hive script can take in user-defined functions ?
Using the following word count as an example. Can you show me how the Pig script and Hive script looks like ? Map: Input: a line (a collection of words) Output: multiple [word, 1] Reduce: Input: [word, [1, 1, 1, ...]] Output: [word, count] Rgds, Ricky -----Original Message----- From: Amr Awadallah [mailto:a...@cloudera.com] Sent: Wednesday, May 06, 2009 3:14 PM To: core-user@hadoop.apache.org Subject: Re: PIG and Hive > The difference between PIG and Hive seems to be pretty insignificant. Difference between Pig and Hive is significant, specifically: (1) Pig doesn't require underlying structure to the data, Hive does imply structure via a metastore. This has it pros and cons. It allows Pig to be more suitable for ETL kind tasks where the input data is still a mish-mash and you want to convert it to be structured. On the other hand, Hive's metastore provides a dictionary that lets you easily see what columns exist in which tables which can be very handy. (2) Pig is a new language, easy to learn if you know languages similar to Perl. Hive is a sub-set of SQL with very simple variations to enable map-reduce like computation. So, if you come from a SQL background you will find Hive QL extremely easy to pickup (many of your SQL queries will run as is), while if you come from a procedural programming background (w/o SQL knowledge) then Pig will be much more suitable for you. Furthermore, Hive is a bit easier to integrate with other systems and tools since it speaks the language they already speak (i.e. SQL). You're right that HBase is a completely different game, HBase is not about being a high level language that compiles to map-reduce, HBase is about allowing Hadoop to support lookups/transactions on key/value pairs. HBase allows you to (1) do quick random lookups, versus scan all of data sequentially, (2) do insert/update/delete from middle, not just add/append. -- amr Ricky Ho wrote: > Jeff, > > Thanks for the pointer. > It is pretty clear that Hive and PIG are the same kind and HBase is a > different kind. > The difference between PIG and Hive seems to be pretty insignificant. Layer > a tool on top of them can completely hide their difference. > > I am viewing your PIG and Hive tutorial and hopefully can extract some > technical details there. > > Rgds, > Ricky > -----Original Message----- > From: Jeff Hammerbacher [mailto:ham...@cloudera.com] > Sent: Wednesday, May 06, 2009 1:38 PM > To: core-user@hadoop.apache.org > Subject: Re: PIG and Hive > > Here's a permalink for the thread on MarkMail: > http://markmail.org/thread/ee4hpcji74higqvk > > On Wed, May 6, 2009 at 4:55 AM, Sharad Agarwal <shara...@yahoo-inc.com>wrote: > > >> see core-user mail thread with subject "HBase, Hive, Pig and other Hadoop >> based technologies" >> >> - Sharad >> >> Ricky Ho wrote: >> >>> Are they competing technologies of providing a higher level language for >>> >> Map/Reduce programming ? >> >>> Or are they complementary ? >>> >>> Any comparison between them ? >>> >>> Rgds, >>> Ricky >>> >>