RE: Hive-Hbase with large number of columns

John Sichi Wed, 16 Jun 2010 19:32:31 -0700

Looks like that patch has a bug.  It should not be changing 
PARTITIONS.PART_NAME, which is an indexed column.  Try again, undoing that line 
of the patch first.

JVS
________________________________________
From: Ray Duong [[email protected]]
Sent: Wednesday, June 16, 2010 5:24 PM
To: [email protected]
Subject: Re: Hive-Hbase with large number of columns

Hi,

I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to create 
an external table in Hive for a large number of columns ( upto 4000bytes).

Now when I tried to drop the external table I get the following error message.  
Is there another file that I need to modify in order to drop the table?

hive> drop table hbase_test;

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were found 
while auto-creating/validating the datastore for classes. The errors are 
printed in the log, and are attached to this exception.
NestedThrowables:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too 
long; max key length is 767 bytes
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

Thanks,
-ray

On Tue, Jun 15, 2010 at 4:05 PM, John Sichi 
<[email protected]<mailto:[email protected]>> wrote:
Column-level properties are attractive for other reasons, but I don't think we 
should do it as a workaround for underlying limits.  I've noted in JIRA that I 
think a LOB would be more appropriate here.

Note that while you're waiting for a permanent resolution, you can use ALTER 
TABLE on your metastore DBMS to widen the precision.

JVS

On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:

Yes, I think I might have to do that. I was trying to avoid multiple Hbase 
scans with one big table.

BTW, would it better to store the column SERDE properties for each column 
versus at the table level to avoid the 767 or 4000 byte limitation?

Thanks again,
-ray

On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo 
<[email protected]<mailto:[email protected]>> wrote:

On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for all the help.

-ray

On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ray,

4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far as 
I know this is the smallest maximum VARCHAR size out of the databases we 
currently try to support (MySQL, Oracle, Derby, etc).

Carl

On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong 
<[email protected]<mailto:[email protected]>> wrote:
Thank John/Carl,

Yep, there seems to be a limit on the 767 byte size.  So I see the patch 
HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there is 
a limit beyond 4000 bytes?

-ray

Error:
Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to 
length 767.

On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ray,

There is currently a 767 byte size limit on SERDEPROPERTIES values (see 
http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're 
bumping into this limitation (assuming you abbreviated the column names in your 
example).

On Tue, Jun 15, 2010 at 12:03 PM, John Sichi 
<[email protected]<mailto:[email protected]>> wrote:
That exception is coming from the metastore (trying to write the table 
definition).  Could you dig down into the Hive logs to see if you can get the 
underlying cause?

You can get the logs to spew on console by adding "-hiveconf 
hive.root.logger=DEBUG,console" to your Hive CLI invocation.

JVS

On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:

Hi,

I'm trying to map a Hbase table in Hive that contains large number of columns.  
Since Hbase is designed to be a wide table, does Hive/Hbase integration have 
any set limitation on the number of columns it can map in one table?  I seem to 
hit a limit at 10 columns.

Thanks,
-ray

create external table hbase_t1
(
key string,
f1_a string,
f2_a string,
f1_b string,
f2_b string,
...
...
f1_m string,
f2_m string,

 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
 )
 TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");

Error Message:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed 
: INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES 
(?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO 
`SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

You have probably thought of this, but in the short term you can create two 
vertically partitioned tables and do a 1 to 1 join on their key.
Edward

RE: Hive-Hbase with large number of columns

Reply via email to