Re: Hive-Hbase with large number of columns

Edward Capriolo Tue, 15 Jun 2010 14:42:52 -0700

On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <[email protected]> wrote:


> Thanks for all the help.
>
> -ray
>
>
> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <[email protected]> wrote:
>
>> Hi Ray,
>>
>> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As
>> far as I know this is the smallest maximum VARCHAR size out of the databases
>> we currently try to support (MySQL, Oracle, Derby, etc).
>>
>> Carl
>>
>>
>> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <[email protected]> wrote:
>>
>>> Thank John/Carl,
>>>
>>> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
>>> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
>>> is a limit beyond 4000 bytes?
>>>
>>> -ray
>>>
>>> Error:
>>> Caused by: ERROR 22001: A truncation error was encountered trying to
>>> shrink VARCHAR
>>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>>> 767.
>>>
>>>
>>>
>>>
>>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <[email protected]>wrote:
>>>
>>>> Hi Ray,
>>>>
>>>> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
>>>> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that
>>>> you're bumping into this limitation (assuming you abbreviated the column
>>>> names in your example).
>>>>
>>>>
>>>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <[email protected]>wrote:
>>>>
>>>>> That exception is coming from the metastore (trying to write the table
>>>>> definition).  Could you dig down into the Hive logs to see if you can get
>>>>> the underlying cause?
>>>>>
>>>>>  You can get the logs to spew on console by adding "-hiveconf
>>>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>>>
>>>>> JVS
>>>>>
>>>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to map a Hbase table in Hive that contains large number of
>>>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>>>> integration have any set limitation on the number of columns it can map in
>>>>> one table?  I seem to hit a limit at 10 columns.
>>>>>
>>>>> Thanks,
>>>>> -ray
>>>>>
>>>>> create external table hbase_t1
>>>>> (
>>>>> key string,
>>>>> f1_a string,
>>>>> f2_a string,
>>>>> f1_b string,
>>>>> f2_b string,
>>>>> ...
>>>>> ...
>>>>> f1_m string,
>>>>> f2_m string,
>>>>>
>>>>>  )
>>>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>>>> )
>>>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>>>
>>>>> Error Message:
>>>>>
>>>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>>>>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>>>>> VALUES (?,?,?)
>>>>> NestedThrowables:
>>>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException:
>>>>> INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES
>>>>> (?,?,?)
>>>>> FAILED: Execution Error, return code 1 from
>>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
You have probably thought of this, but in the short term you can create two
vertically partitioned tables and do a 1 to 1 join on their key.
Edward

Re: Hive-Hbase with large number of columns

Reply via email to