Hive and protocol buffers -- are there UDFs for dealing with them?

2010-07-12 Thread Leo Alekseyev
Hi all, I was wondering if anyone is using Hive with protocol buffers. The Hadoop wiki links to http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook for SerDe examples; there it says that there is no built-in support for protobufs. Since this presentation is about a year old, I

Lost old tables in hive metastore after installing a new version of hive

2010-08-10 Thread Leo Alekseyev
I built hive from SVN, and am running into the following issue: I can't seem to access any of my old tables, that is, SHOW TABLES doesn't return any entries. I can create new tables, and they show up in HDFS in /user/hive/warehouse alongside the old ones. Likewise, when I run the old version of

Why two map stages for a simple select query?

2010-08-13 Thread Leo Alekseyev
Hi all, I'm mystified by Hive's behavior for two types of queries. 1: consider the following simple select query: insert overwrite table alogs_test_extracted1 select raw.client_ip, raw.cookie, raw.referrer_flag from alogs_test_rc6 raw; Both tables are stored as rcfiles, and LZO compression is

Re: Why two map stages for a simple select query?

2010-08-13 Thread Leo Alekseyev
by setting hive.merge.mapfiles=false. Likewise hive.merge.mapredfiles is used to control whether to merge the result of a map-reduce job. On Aug 13, 2010, at 8:16 PM, Leo Alekseyev wrote: Hi all, I'm mystified by Hive's behavior for two types of queries. 1: consider the following simple

Re: alter table foo set location fails

2010-08-24 Thread Leo Alekseyev
I believe I can answer my own question: HIVE-1514 was committed on 08/10; my Hive was built on 08/09... On Mon, Aug 23, 2010 at 7:25 PM, Leo Alekseyev dnqu...@gmail.com wrote: Hi all, I'm looking at http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL, where there's a code example

Re: Hive can't run query with a TextInputFormat exception

2010-09-15 Thread Leo Alekseyev
This is a me too post: we just ran into an identical problem setting up a new cluster using CDH3b2. This is all rather mystifying, because all the correct libraries are there; in fact the hive command line looks something like /usr/java/jdk1.6.0_12/bin/java -Xmx256m -server

Re: Hive can't run query with a TextInputFormat exception

2010-09-15 Thread Leo Alekseyev
: On Wed, Sep 15, 2010 at 1:14 PM, Leo Alekseyev dnqu...@gmail.com wrote: This is a me too post: we just ran into an identical problem setting up a new cluster using CDH3b2.  This is all rather mystifying, because all the correct libraries are there; in fact the hive command line looks something like

Is it possible to have counters in a transform map or reduce script?

2010-09-15 Thread Leo Alekseyev
I would like my Hive transform scripts to report some numbers at the end of the job. If I were using straight-up hadoop streaming, I would do something like sys.stderr.write(reporter:counter:bad_lines,INCOMPLETE_LINES,1\n) I added this line to a script used in the transform... using foo.py

Re: Incremental load from Hive into HBase?

2010-09-28 Thread Leo Alekseyev
On Tue, Sep 28, 2010 at 7:50 PM, Leo Alekseyev dnqu...@gmail.com wrote: I can create and load data into an HBase table as per the instructions from Hive/HBase Integration wiki page using something like create table ... STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler

Is it possible to speed up select ... limit query?

2010-10-06 Thread Leo Alekseyev
Suppose I have a large-ish table (over a billion rows) and want to grab the first 5 million or so. When I run the query create table foo_subset as select col1, col2, col3 from foo limit 500, the job launches one reducer, which runs for a while. Looking at HDFS_BYTES_READ/WRITTEN, I see that