Riza Suminto created IMPALA-13620:
-------------------------------------
Summary: Lower default parallelism of compute_table_stats.py
Key: IMPALA-13620
URL: https://issues.apache.org/jira/browse/IMPALA-13620
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto
compute_table_stats.py might be overparallize in iarge core machine if
--parallelism is not set. This overparallelism seems to overload HMS and cause
failure in some DDL operation such as follow.
{noformat}
2024-12-15 07:28:08,946 Thread-2: Failed on table tpch.customer
Traceback (most recent call last):
File
"/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/util/compute_table_stats.py",
line 41, in compute_stats_table
result = impala_client.execute(statement)
File
"/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
line 188, in execute
handle = self.__execute_query(query_string.strip(), user=user)
File
"/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
line 284, in __execute_query
self.wait_for_finished(handle)
File
"/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
line 314, in wait_for_finished
raise ImpalaBeeswaxException(error_log, None)
ImpalaBeeswaxException: Query e248e193f7cd1d0d:3a2eee1d00000000 failed:
ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
allowed{noformat}
The stacktrace in CatalogD is as follow:
{noformat}
E1215 07:28:08.935083 23281 JniUtil.java:184]
e248e193f7cd1d0d:3a2eee1d00000000] Error in ALTER_TABLE tpch.customer issued by
jenkins. Time spent: 1m
I1215 07:28:08.935757 23281 jni-util.cc:321] e248e193f7cd1d0d:3a2eee1d00000000]
org.apache.impala.common.ImpalaRuntimeException: Error making 'alter_table' RPC
to Hive Metastore:
at
org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6675)
at
org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6633)
at
org.apache.impala.service.CatalogOpExecutor.alterTableUpdateStatsInner(CatalogOpExecutor.java:2004)
at
org.apache.impala.service.CatalogOpExecutor.alterTableUpdateStats(CatalogOpExecutor.java:1932)
at
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1398)
at
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:463)
at
org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:316)
at
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:259)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:315)
Caused by: InvalidOperationException(message:Alter table in REMOTE database is
not allowed)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_req(ThriftHiveMetastore.java:3002)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_req(ThriftHiveMetastore.java:2989)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:489)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:460)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy11.alter_table(Unknown Source)
at
org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6671)
... 13 more
CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
allowed
@ 0x10e88c4
@ 0x1cb723a
@ 0x1099748
@ 0x102e5a7
@ 0xfe16b8
@ 0xfc1393
@ 0xfc8e5b
@ 0x15355a2
@ 0x1d9ccd9
@ 0x26f5d47
@ 0x7faf6d8b2ea5
@ 0x7faf6a7adb0d
E1215 07:28:08.936064 23281 catalog-server.cc:292]
e248e193f7cd1d0d:3a2eee1d00000000] ImpalaRuntimeException: Error making
'alter_table' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
allowed{noformat}
My ad-hoc experiment with 16 parallelism max is able to let
compute-table-stats.sh pass without any error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]