[
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliot West updated HIVE-12274:
-------------------------------
Description:
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit
when processing JSON data that has a large nested schema. The struct definition
is truncated when inserted into the metastore database column
{{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
Given that the purpose of these columns is to hold very loosely defined
configuration values it seems rather limiting to impose such a relatively low
length bound. One can imagine that valid use cases will arise where reasonable
parameter/property values exceed the current limit.
h2. Context
These limitations were in by the patch attributed to HIVE-1364 which mentions
the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the
limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the
configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
* Postgres supports a max of 1GB for {{character}} datatype. (source)
* MySQL can support upto 65535 bytes for the entire row. So long as the
{{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good.
(source)
* SQL Server's {{varchar}} max length is 8000 and can go beyond using
"varchar(max)" with the same limitation as MySQL being 65535 bytes for the
entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)
h2. Proposal
Can these columns not use CLOB-like types as for example as used by
{{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist
for all targeted database platforms:
* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}
I'd suggest that the candidates for type change are:
* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}
Finally, will this limitation persist in the work resulting from HIVE-9452?
was:
This issue is very similar in principle to HIVE-1364. We are hitting a limit
when processing JSON data that has a large nested schema. The struct definition
is truncated when inserted into the metastore database column
{{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
Given that the purpose of these columns is to hold very loosely defined
configuration values it seems rather limiting to impose such a relatively low
length bound. One can imagine that valid use cases will arise where reasonable
parameter/property values exceed the current limit. Can these columns not use
CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would
seem that suitable type equivalents exist for all targeted database platforms:
* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}
I'd suggest that the candidates for type change are:
* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}
Finally, will this limitation persist in the work resulting from HIVE-9452?
> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 2.0.0
> Reporter: Elliot West
> Labels: metastore
> Attachments: HIVE-12274.example.ddl.hql
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit
> when processing JSON data that has a large nested schema. The struct
> definition is truncated when inserted into the metastore database column
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined
> configuration values it seems rather limiting to impose such a relatively low
> length bound. One can imagine that valid use cases will arise where
> reasonable parameter/property values exceed the current limit.
> h2. Context
> These limitations were in by the patch attributed to HIVE-1364 which mentions
> the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the
> limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
> * Postgres supports a max of 1GB for {{character}} datatype. (source)
> * MySQL can support upto 65535 bytes for the entire row. So long as the
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good.
> (source)
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the
> entire row. (source)
> * Derby's {{varchar}} can be upto 32672 bytes. (source)
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> Finally, will this limitation persist in the work resulting from HIVE-9452?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)