[jira] [Commented] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

Sergey Shelukhin (JIRA) Sun, 09 Nov 2014 21:48:06 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204385#comment-14204385
 ]


Sergey Shelukhin commented on HIVE-8295:
----------------------------------------

Nit: is it possible to make batch size static final?
Also, given that this is not necessary for normal databases, is it possible to 
either make it configurable, off by default, or detect Oracle if that is easy? 
Like the way it does quote handling in MySQL

> Add batch retrieve partition objects for metastore direct sql 
> --------------------------------------------------------------
>
>                 Key: HIVE-8295
>                 URL: https://issues.apache.org/jira/browse/HIVE-8295
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>         Attachments: HIVE-8295.1.patch
>
>
> Currently in MetastoreDirectSql partition objects are constructed in a way 
> that fetching partition ids first. However, if the partition ids that match 
> the filter is larger than 1000, direct sql will fail with the following stack 
> trace:
> {code}
> 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
> (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms 
> + 13.048901ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  
> inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and 
> "TBLS"."TBL_NAME" = ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"    
>   and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER2" on 
> "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 
> where (("FILTER2"."PART_KEY_VAL" = ?))]
> 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID", "SDS"."SD_ID", "SDS"."CD_ID", "SERDES"."SERDE_ID", 
> "PARTITIONS"."CREATE_TIME", "PARTITIONS"."LAST_ACCESS_TIME", 
> "SDS"."INPUT_FORMAT", "SDS"."IS_COMPRESSED", 
> "SDS"."IS_STOREDASSUBDIRECTORIES", "SDS"."LOCATION", "SDS"."NUM_BUCKETS", 
> "SDS"."OUTPUT_FORMAT", "SERDES"."NAME", "SERDES"."SLIB" from "PARTITIONS"  
> left outer join "SDS" on "PARTITIONS"."SD_ID" = "SDS"."SD_ID"   left outer 
> join "SERDES" on "SDS"."SERDE_ID" = "SERDES"."SERDE_ID" where "PART_ID" in 
> (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
> ) order by "PART_NAME" asc".
>     at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
>     at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>     at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> NestedThrowablesStackTrace:
> java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in 
> a list is 1000
> {code}
> Add retrieve partition objects in batch for direct sql will solve this Oracle 
> specific problem. And it also bring some performance benefit and will reduce 
> the memory footprint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

Reply via email to