[ https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980636#comment-16980636 ]
Anurag Mantripragada commented on IMPALA-9188: ---------------------------------------------- I think the bug is at this line: [https://github.com/apache/impala/blob/e716e76cccf59c2780571429b1b945d6bbc61b8d/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L497] For a composite primary key like (id, year) we are generating unique constraint names for each column whereas, they should have the same constraint name. In Hive, the comparator first sorts using constraint name and then key_seq if constraint names are same.This is why the hive comparator is giving different results. We should generate a new name only if key_seq is 1, if not, we should use existing constraint name. We already do something similar for foreign keys. [https://github.com/apache/impala/blob/e716e76cccf59c2780571429b1b945d6bbc61b8d/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L565] > Dataload is failing when USE_CDP_HIVE=true > ------------------------------------------ > > Key: IMPALA-9188 > URL: https://issues.apache.org/jira/browse/IMPALA-9188 > Project: IMPALA > Issue Type: Bug > Reporter: Sahil Takiar > Assignee: Anurag Mantripragada > Priority: Critical > > When USE_CDP_HIVE=true, Impala builds are failing during dataload when > creating tables with PK/FK constraints. > The error is: > {code:java} > ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS > functional_seq_record_snap.child_table ( > seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE > RELY, foreign key > (id, year) references functional_seq_record_snap.parent_table(id, year) > DISABLE NOVALIDATE > RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) > DISABLE > NOVALIDATE RELY) > row format delimited fields terminated by ',' > LOCATION '/test-warehouse/child_table' > Traceback (most recent call last): > File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file > result = impala_client.execute(query) > File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute > handle = self.__execute_query(query_string.strip(), user=user) > File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query > handle = self.execute_query_async(query_string, user=user) > File "Impala/tests/beeswax/impala_beeswax.py", line 356, in > execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > ImpalaBeeswaxException: ImpalaBeeswaxException: > INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> > MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive > Metastore: > CAUSED BY: MetaException: Foreign key references id:int;year:string; but no > corresponding primary key or unique key exists. Possible keys: > [year:string;id:int;]{code} > The corresponding error in HMS is: > {code:java} > 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] metastore.HiveMetaStore: > 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, > dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, > lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), > FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, > type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], > location:hdfs://localhost:20500/test-warehouse/child_table, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=,, field.delim=,}), bucketCols:null, > sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, > OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, > viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, > ownerType:USER, accessType:8) > 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] HiveMetaStore.audit: > ugi=jenkins ip=127.0.0.1 cmd=source:127.0.0.1 create_table_req: > Table(tableName:child_table, dbName:functional_seq_record_gzip, > owner:jenkins, createTime:0, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), > FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, > type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], > location:hdfs://localhost:20500/test-warehouse/child_table, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=,, field.delim=,}), bucketCols:null, > sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, > OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, > viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, > ownerType:USER, accessType:8) > 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] > metastore.MetastoreDefaultTransformer: Starting translation for CreateTable > for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, > HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] > on table child_table > 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] > metastore.MetastoreDefaultTransformer: Table to be created is of type > EXTERNAL_TABLE but not MANAGED_TABLE > 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] > metastore.MetastoreDefaultTransformer: Transformer returning > table:Table(tableName:child_table, dbName:functional_seq_record_gzip, > owner:jenkins, createTime:0, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), > FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, > type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], > location:hdfs://localhost:20500/test-warehouse/child_table, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=,, field.delim=,}), bucketCols:null, > sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, > OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, > viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, > ownerType:USER, accessType:8) > 2019-11-22T06:36:59,945 ERROR [pool-10-thread-13] > metastore.RetryingHMSHandler: MetaException(message:Foreign key references > id:int;year:string; but no corresponding primary key or unique key exists. > Possible keys: [year:string;id:int;]) > at > org.apache.hadoop.hive.metastore.ObjectStore.addForeignKeys(ObjectStore.java:4968) > at > org.apache.hadoop.hive.metastore.ObjectStore.createTableWithConstraints(ObjectStore.java:1289) > at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy27.createTableWithConstraints(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:2220) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_req(HiveMetaStore.java:2404) > at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy34.create_table_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_req.getResult(ThriftHiveMetastore.java:16107) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_req.getResult(ThriftHiveMetastore.java:16091) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Looks like this was caused by IMPALA-9104. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org