[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926881#comment-15926881
 ] 

Naveen Gangam commented on HIVE-12274:
--------------------------------------

Much better. Looks like all the failures from the TestPerfCliDriver have been 
resolved. The 10 failures from the TestSparkCliDriver seem to be random, 
perhaps flaky ones?. I have run these 10 test successully locally multiple 
times now. No failures thus far. Amongst the rest of the failures, the 
testTableFilter one's are related to the patch because the API uses a query on 
a CLOB field that cannot be used for String comparison.
{code}
SELECT A0.TBL_NAME FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID 
INNER JOIN TABLE_PARAMS C0 ON A0.TBL_ID = C0.TBL_ID WHERE C0.PARAM_KEY = 
'test_param_2' AND B0."NAME" = ? AND C0.PARAM_VALUE = ?
        at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:677)
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
        at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:279)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.listTableNamesByFilter(ObjectStore.java:3198)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
        at com.sun.proxy.$Proxy19.listTableNamesByFilter(Unknown Source)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_names_by_filter(HiveMetaStore.java:2140)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy25.get_table_names_by_filter(Unknown Source)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_names_by_filter.getResult(ThriftHiveMetastore.java:11511)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_names_by_filter.getResult(ThriftHiveMetastore.java:11495)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: Comparisons between 'CLOB (UCS_BASIC)' and 
'CLOB (UCS_BASIC)' are not supported. Types must be comparable. String types 
must also have matching collation. If collation does not match, a possible 
solution is to cast operands to force them to the default collation (e.g. 
SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
'T1')
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(Unknown 
Source)
        at org.apache.derby.jdbc.Driver42.newEmbedPreparedStatement(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
Source)
        at 
com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1193)
        at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:345)
        at 
org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getPreparedStatementForQuery(RDBMSQueryUtils.java:211)
        at 
org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:609)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855)
        at org.datanucleus.store.query.Query.executeWithMap(Query.java:1762)
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:372)
        at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:279)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.listTableNamesByFilter(ObjectStore.java:3198)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
        at com.sun.proxy.$Proxy19.listTableNamesByFilter(Unknown Source)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_names_by_filter(HiveMetaStore.java:2140)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy25.get_table_names_by_filter(Unknown Source)
{code}

I am taking a look at these.

> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12274
>                 URL: https://issues.apache.org/jira/browse/HIVE-12274
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>            Assignee: Naveen Gangam
>              Labels: metastore
>         Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.example.ddl.hql, HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> After updating the maximum length the metastore database needs to be 
> configured and restarted with the new settings. Altering {{MAX_STRING_SIZE}} 
> will update database objects and possibly invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for 
> virtual columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or 
> {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated 
> virtual columns causes the index key to exceed index key length limits. 
> Attempts to rebuild such indexes will fail with {{ORA-01450: maximum key 
> length exceeded}}.
> * Views will be invalidated if they contain {{VARCHAR2(4000)}}, 4000-byte 
> {{NVARCHAR2}}, or {{RAW(2000)}} typed expression columns.
> * Materialized views will be updated with new metadata {{VARCHAR2(4000)}}, 
> 4000-byte {{NVARCHAR2}}, and {{RAW(2000)}} typed expression columns
> * So the limitation could be raised to 32672 bytes, with the caveat that 
> MySQL and SQL Server limit the row length to 65535 bytes, so that should also 
> be validated to provide consistency.
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to