[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009447#comment-16009447
 ] 

Remus Rusanu commented on HIVE-12274:
-------------------------------------

[~ngangam] I'm having problems with PostgreSQL metastore after these changes. 
The CLOB fields is saved by the PG driver as an int handle (see 
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgPreparedStatement.javaL1239):

{code}
metastore=# select *, CAST(lo_get(CAST("TYPE_NAME" as bigint)) as TEXT) from 
"COLUMNS_V2"  LIMIT 1;
 CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |  lo_get
-------+---------+-------------+-----------+-------------+----------
     2 |         | customer    | 21664     |           0 | \x696e74
(1 row)

metastore=# select version();
                                                    version
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.2 on x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 
8.0.0 (clang-800.0.42.1), 64-bit
{code}

This causes runtime failures on a partitioned table:

{code}
hive> select * from srcpart;
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
Error: type expected at the position 0 of '24030:24031' but '24030' is found.
{code}

the {{24030:24031}} should be {{string:string}} but is not translated through 
lo_get by {{MetasoreDirectSql.getPartitionsFromPartitionIds}}

{code}
    // Get FieldSchema stuff if any.
    if (!colss.isEmpty()) {
      // We are skipping the CDS table here, as it seems to be totally useless.
      queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
          + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
          + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
      loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc<List<FieldSchema>>() {
        @Override
        public void apply(List<FieldSchema> t, Object[] fields) {
          t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]), 
(String)fields[1]));
        }});
{code}

> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12274
>                 URL: https://issues.apache.org/jira/browse/HIVE-12274
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>            Assignee: Naveen Gangam
>              Labels: metastore
>             Fix For: 2.3.0, 3.0.0
>
>         Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.5.patch, HIVE-12274.example.ddl.hql, 
> HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> After updating the maximum length the metastore database needs to be 
> configured and restarted with the new settings. Altering {{MAX_STRING_SIZE}} 
> will update database objects and possibly invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for 
> virtual columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or 
> {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated 
> virtual columns causes the index key to exceed index key length limits. 
> Attempts to rebuild such indexes will fail with {{ORA-01450: maximum key 
> length exceeded}}.
> * Views will be invalidated if they contain {{VARCHAR2(4000)}}, 4000-byte 
> {{NVARCHAR2}}, or {{RAW(2000)}} typed expression columns.
> * Materialized views will be updated with new metadata {{VARCHAR2(4000)}}, 
> 4000-byte {{NVARCHAR2}}, and {{RAW(2000)}} typed expression columns
> * So the limitation could be raised to 32672 bytes, with the caveat that 
> MySQL and SQL Server limit the row length to 65535 bytes, so that should also 
> be validated to provide consistency.
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to