[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-12274:
-------------------------------
    Description: 
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit 
when processing JSON data that has a large nested schema. The struct definition 
is truncated when inserted into the metastore database column 
{{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.

Given that the purpose of these columns is to hold very loosely defined 
configuration values it seems rather limiting to impose such a relatively low 
length bound. One can imagine that valid use cases will arise where reasonable 
parameter/property values exceed the current limit. 

h2. Context
These limitations were in by the [patch 
attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
 to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
reason. However, nowadays the limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
* Postgres supports a max of 1GB for {{character}} datatype. 
([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
* MySQL can support upto 65535 bytes for the entire row. So long as the 
{{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
* SQL Server's {{varchar}} max length is 8000 and can go beyond using 
"varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)

h2. Proposal
Can these columns not use CLOB-like types as for example as used by 
{{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist 
for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?

  was:
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit 
when processing JSON data that has a large nested schema. The struct definition 
is truncated when inserted into the metastore database column 
{{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.

Given that the purpose of these columns is to hold very loosely defined 
configuration values it seems rather limiting to impose such a relatively low 
length bound. One can imagine that valid use cases will arise where reasonable 
parameter/property values exceed the current limit. 

h2. Context
These limitations were in by the [patch 
attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
 to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
reason. However, nowadays the limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
* Postgres supports a max of 1GB for {{character}} datatype. 
([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
* MySQL can support upto 65535 bytes for the entire row. So long as the 
{{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
(source)
* SQL Server's {{varchar}} max length is 8000 and can go beyond using 
"varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)

h2. Proposal
Can these columns not use CLOB-like types as for example as used by 
{{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist 
for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?


> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12274
>                 URL: https://issues.apache.org/jira/browse/HIVE-12274
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>              Labels: metastore
>         Attachments: HIVE-12274.example.ddl.hql
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. (source)
> * Derby's {{varchar}} can be upto 32672 bytes. (source)
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to