[
https://issues.apache.org/jira/browse/IMPALA-12856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821999#comment-17821999
]
Quanlong Huang commented on IMPALA-12856:
-----------------------------------------
To reproduce the issue, I use two Impala clusters. One adds, drops and reloads
a partition. The other one just processing the RELOAD events. It takes several
minutes to hit this bug.
Start the first Impala cluster with --enable_reload_events=true
{code:bash}
bin/start-impala-cluster.py --catalogd_args="--enable_reload_events=true"{code}
Start the second Impala cluster manually using other free ports
{code:bash}
be/build/latest/service/statestored -logbufsecs=5 -v=1 -max_log_files=10
-log_rotation_match_pid=true -log_filename=statestored_cluster2
-log_dir=${IMPALA_HOME}/logs/cluster -state_store_port=24001
-webserver_port=25011 &
CLASSPATH="fe/target/impala-frontend-4.4.0-SNAPSHOT.jar:fe/src/test/resources/:fe/target/dependency/*"
be/build/latest/service/catalogd -logbufsecs=5 -v=1 -max_log_files=10
-log_rotation_match_pid=true -log_filename=catalogd_cluster2
-log_dir=${IMPALA_HOME}/logs/cluster -catalog_service_port=26001
-state_store_subscriber_port=23021 -webserver_port=25021
-state_store_port=24001 --enable_reload_events=true &
CLASSPATH="fe/target/impala-frontend-4.4.0-SNAPSHOT.jar:fe/src/test/resources/:fe/target/dependency/*"
be/build/latest/service/impalad
-jni_frontend_class=org/apache/impala/service/JniFrontend -mem_limit=500m
-logbufsecs=5 -v=1 -max_log_files=10 -log_rotation_match_pid=true
-log_filename=impalad_cluster2 -log_dir=${IMPALA_HOME}/logs/cluster
-beeswax_port=41000 -hs2_port=41050 -hs2_http_port=48000 -krpc_port=47000
-state_store_subscriber_port=43000 -webserver_port=45000
-geospatial_library=HIVE_ESRI -state_store_port=24001 &
{code}
Create a partitioned table in cluster1.
{code:sql}
create table part_tbl (i int) partitioned by (p int);{code}
Make this table loaded in cluster2 using a DESCRIBE command:
{code:bash}
impala-shell.sh -i "localhost:41050" -q "describe part_tbl"{code}
Run a loop of add+drop partition in cluster1
{code:bash}
while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD
PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null;
done{code}
Using another session, run a loop of REFRESH partition in cluster1. More
sessions doing this seems increasing the possibility of hitting the bug.
{code:bash}
while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION (p=1)"
> /dev/null; done{code}
Then observe the event-processor status of cluster2. It stops in several
minutes.
> IllegalStateException in processing RELOAD events due to malformed HMS
> Partition objects
> ----------------------------------------------------------------------------------------
>
> Key: IMPALA-12856
> URL: https://issues.apache.org/jira/browse/IMPALA-12856
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Priority: Critical
>
> When processing RELOAD events on partitions, catalogd fetch the Partition
> objects from HMS. The returned Partition objects could be malformed which
> causes an IllegalStateException and stops the event-processor. This was
> observed when a partition is added and dropped in a loop.
> {noformat}
> E0229 15:19:27.945312 12668 MetastoreEventsProcessor.java:990] Unexpected
> exception received while processing event
> Java exception follows:
> java.lang.IllegalStateException
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at
> org.apache.impala.catalog.HdfsTable.getTypeCompatiblePartValues(HdfsTable.java:2598)
> at
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2856)
> at
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsFromNamesIfExists(CatalogOpExecutor.java:4805)
> at
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:4742)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:1050)
> at
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processPartitionReload(MetastoreEvents.java:2941)
> at
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2906)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1248)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:672)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> E0229 15:19:27.963455 12668 MetastoreEventsProcessor.java:1251] Event id:
> 8697728
> Event Type: RELOAD
> Event time: 1709191166
> Database name: default
> Table name: part_tbl
> Event message: H4s{noformat}
> The failed check is asserting the number of partition columns cached in
> catalogd matches the number of partition values from the HMS object:
> {code:java}
> public List<LiteralExpr> getTypeCompatiblePartValues(List<String> values) {
> List<LiteralExpr> result = new ArrayList<>();
> List<Column> partitionColumns = getClusteringColumns();
> Preconditions.checkState(partitionColumns.size() == values.size()); //
> This failed{code}
> After adding some debug logs, I found the Partition obejct got from HMS had
> an empty values list:
> {noformat}
> I0229 16:04:04.679625 25867 HdfsTable.java:2829] HMS Partition:
> Partition(values:[], dbName:default, tableName:part_tbl,
> createTime:1709193844, lastAccessTime:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)],
> location:hdf
> s://localhost:20500/test-warehouse/part_tbl/p=1,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializ
> ationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{}),
> bucketCols:[], sortCols:[], parameters:{},
> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), par
> ameters:{}, catName:hive, writeId:0)
> I0229 16:04:04.680133 25867 MetastoreEventsProcessor.java:1189] Time elapsed
> in processing event batch: 17.145ms
> E0229 16:04:04.680475 25867 MetastoreEventsProcessor.java:990] Unexpected
> exception received while processing event
> Java exception follows:
> java.lang.IllegalStateException: partitionColumns.size(): 1, values.size():
> 0. partCols: Column{name_=p, type_=INT, comment_=null,
> stats=ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4,
> numDistinct_=-1, numNulls_=-1, numTrues=-1, numFalses=-1, lowValue=-1,
> highValue=-1}, position_=0}, partValues: {noformat}
> EventProcessor should recover in this case.
> On the other hand, returning a Partition object with empty partition value
> list seems a bug of HMS. CC [~hemanth619], [~VenuReddy]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]