Re: Hive-Hbase with large number of columns

Carl Steinbach Wed, 16 Jun 2010 20:02:56 -0700

I updated the patch:
http://issues.apache.org/jira/secure/attachment/12447307/HIVE-1364.2.patch.txt


Thanks.

Carl

On Wed, Jun 16, 2010 at 7:31 PM, John Sichi <[email protected]> wrote:

> Looks like that patch has a bug.  It should not be changing
> PARTITIONS.PART_NAME, which is an indexed column.  Try again, undoing that
> line of the patch first.
>
> JVS
> ________________________________________
> From: Ray Duong [[email protected]]
> Sent: Wednesday, June 16, 2010 5:24 PM
> To: [email protected]
> Subject: Re: Hive-Hbase with large number of columns
>
> Hi,
>
> I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to
> create an external table in Hive for a large number of columns ( upto
> 4000bytes).
>
> Now when I tried to drop the external table I get the following error
> message.  Is there another file that I need to modify in order to drop the
> table?
>
> hive> drop table hbase_test;
>
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were
> found while auto-creating/validating the datastore for classes. The errors
> are printed in the log, and are attached to this exception.
> NestedThrowables:
> com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too
> long; max key length is 767 bytes
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
> Thanks,
> -ray
>
>
> On Tue, Jun 15, 2010 at 4:05 PM, John Sichi <[email protected]<mailto:
> [email protected]>> wrote:
> Column-level properties are attractive for other reasons, but I don't think
> we should do it as a workaround for underlying limits.  I've noted in JIRA
> that I think a LOB would be more appropriate here.
>
> Note that while you're waiting for a permanent resolution, you can use
> ALTER TABLE on your metastore DBMS to widen the precision.
>
> JVS
>
> On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:
>
> Yes, I think I might have to do that. I was trying to avoid multiple Hbase
> scans with one big table.
>
> BTW, would it better to store the column SERDE properties for each column
> versus at the table level to avoid the 767 or 4000 byte limitation?
>
> Thanks again,
> -ray
>
>
> On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <[email protected]
> <mailto:[email protected]>> wrote:
>
>
> On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <[email protected]<mailto:
> [email protected]>> wrote:
> Thanks for all the help.
>
> -ray
>
>
> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <[email protected]<mailto:
> [email protected]>> wrote:
> Hi Ray,
>
> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far
> as I know this is the smallest maximum VARCHAR size out of the databases we
> currently try to support (MySQL, Oracle, Derby, etc).
>
> Carl
>
>
> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <[email protected]<mailto:
> [email protected]>> wrote:
> Thank John/Carl,
>
> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
> is a limit beyond 4000 bytes?
>
> -ray
>
> Error:
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink
> VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to
> length 767.
>
>
>
>
> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <[email protected]
> <mailto:[email protected]>> wrote:
> Hi Ray,
>
> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're
> bumping into this limitation (assuming you abbreviated the column names in
> your example).
>
>
> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <[email protected]<mailto:
> [email protected]>> wrote:
> That exception is coming from the metastore (trying to write the table
> definition).  Could you dig down into the Hive logs to see if you can get
> the underlying cause?
>
> You can get the logs to spew on console by adding "-hiveconf
> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>
> JVS
>
> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>
> Hi,
>
> I'm trying to map a Hbase table in Hive that contains large number of
> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
> integration have any set limitation on the number of columns it can map in
> one table?  I seem to hit a limit at 10 columns.
>
> Thanks,
> -ray
>
> create external table hbase_t1
> (
> key string,
> f1_a string,
> f2_a string,
> f1_b string,
> f2_b string,
> ...
> ...
> f1_m string,
> f2_m string,
>
>  )
>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
> )
>  TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");
>
> Error Message:
>
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
> VALUES (?,?,?)
> NestedThrowables:
> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
>
>
>
>
> You have probably thought of this, but in the short term you can create two
> vertically partitioned tables and do a 1 to 1 join on their key.
> Edward
>
>
>
>
>

Re: Hive-Hbase with large number of columns

Reply via email to