[
https://issues.apache.org/jira/browse/IMPALA-13620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911074#comment-17911074
]
ASF subversion and git services commented on IMPALA-13620:
----------------------------------------------------------
Commit 01b8b45252d50e4887278ae60b6bcf37c68440bb in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=01b8b4525 ]
IMPALA-13620: Refresh compute_table_stats.py script
This patch refreshes compute_table_stats.py script with the following
changes:
- Limit parallelism to IMPALA_BUILD_THREADS at maximum if --parallelism
argument is not set.
- Change its default connection to hs2, leveraging existing
ImpylaHS2Connection.
- Change OptionParser to ArgumentParser.
- Use impala-python3 to run the script.
- Add --exclude_table_names to skip running COMPUTE STATS on certain
tables/views.
- continue_on_error is False by default.
This patch also improves query handle logging in ImpylaHS2Connection.
collect_profile_and_log argument is added to control whether to pull
logs and runtime profile at the end of __fetch_results(). The default
behavior remains unchanged.
Skip COMPUTE STATS for functional_kudu.alltypesagg and
functional_kudu.manynulls because it is invalid to run COMPUTE STATS
over view.
Customized hive-site.xml to set datanucleus.connectionPool.maxPoolSize
to 30 and hikaricp.connectionTimeout to 60000 ms. Also set hive.log.dir
to ${IMPALA_CLUSTER_LOGS_DIR}/hive.
Testing:
Repeatedly run compute-table-stats.sh from cold state and confirm there
is no error occurs. This is the script to do so from active minicluster:
cd $IMPALA_HOME
./bin/start-impala-cluster.py --kill
./testdata/bin/kill-hive-server.sh
./testdata/bin/run-hive-server.sh
./bin/start-impala-cluster.py
./testdata/bin/compute-table-stats.sh > /tmp/compute-stats.txt 2>&1
grep error /tmp/compute-stats.txt
Core tests ran and passed.
Change-Id: I1ebf02f95b957e7dda3a30622b87e8fca3197699
Reviewed-on: http://gerrit.cloudera.org:8080/22231
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Lower default parallelism of compute_table_stats.py
> ---------------------------------------------------
>
> Key: IMPALA-13620
> URL: https://issues.apache.org/jira/browse/IMPALA-13620
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Affects Versions: Impala 4.4.0
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
>
> compute_table_stats.py might be overparallize in iarge core machine if
> --parallelism is not set. This overparallelism seems to overload HMS and
> cause failure in some DDL operation such as follow.
> {noformat}
> 2024-12-15 07:28:08,946 Thread-2: Failed on table tpch.customer
> Traceback (most recent call last):
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/util/compute_table_stats.py",
> line 41, in compute_stats_table
> result = impala_client.execute(statement)
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
> line 188, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
> line 284, in __execute_query
> self.wait_for_finished(handle)
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/tests/beeswax/impala_beeswax.py",
> line 314, in wait_for_finished
> raise ImpalaBeeswaxException(error_log, None)
> ImpalaBeeswaxException: Query e248e193f7cd1d0d:3a2eee1d00000000 failed:
> ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore:
> CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
> allowed{noformat}
> The stacktrace in CatalogD is as follow:
> {noformat}
> E1215 07:28:08.935083 23281 JniUtil.java:184]
> e248e193f7cd1d0d:3a2eee1d00000000] Error in ALTER_TABLE tpch.customer issued
> by jenkins. Time spent: 1m
> I1215 07:28:08.935757 23281 jni-util.cc:321]
> e248e193f7cd1d0d:3a2eee1d00000000]
> org.apache.impala.common.ImpalaRuntimeException: Error making 'alter_table'
> RPC to Hive Metastore:
> at
> org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6675)
> at
> org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6633)
> at
> org.apache.impala.service.CatalogOpExecutor.alterTableUpdateStatsInner(CatalogOpExecutor.java:2004)
> at
> org.apache.impala.service.CatalogOpExecutor.alterTableUpdateStats(CatalogOpExecutor.java:1932)
> at
> org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1398)
> at
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:463)
> at
> org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:316)
> at
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:259)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:315)
> Caused by: InvalidOperationException(message:Alter table in REMOTE database
> is not allowed)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java)
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_req(ThriftHiveMetastore.java:3002)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_req(ThriftHiveMetastore.java:2989)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:489)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:460)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
> at com.sun.proxy.$Proxy11.alter_table(Unknown Source)
> at
> org.apache.impala.service.CatalogOpExecutor.applyAlterTable(CatalogOpExecutor.java:6671)
> ... 13 more
> CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
> allowed
> @ 0x10e88c4
> @ 0x1cb723a
> @ 0x1099748
> @ 0x102e5a7
> @ 0xfe16b8
> @ 0xfc1393
> @ 0xfc8e5b
> @ 0x15355a2
> @ 0x1d9ccd9
> @ 0x26f5d47
> @ 0x7faf6d8b2ea5
> @ 0x7faf6a7adb0d
> E1215 07:28:08.936064 23281 catalog-server.cc:292]
> e248e193f7cd1d0d:3a2eee1d00000000] ImpalaRuntimeException: Error making
> 'alter_table' RPC to Hive Metastore:
> CAUSED BY: InvalidOperationException: Alter table in REMOTE database is not
> allowed{noformat}
> My ad-hoc experiment with 16 parallelism max is able to let
> compute-table-stats.sh pass without any error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]