[ 
https://issues.apache.org/jira/browse/IMPALA-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916588#comment-17916588
 ] 

Quanlong Huang commented on IMPALA-13691:
-----------------------------------------

The failure happens in this assertion:
{code:java}
      Preconditions.checkArgument(oldPartition == null
          || HdfsPartition.comparePartitionKeyValues(
          oldPartition.getPartitionValues(), partBuilder.getPartitionValues()) 
== 0); {code}
https://github.com/apache/impala/blob/5371e0c6df3e329398712af3ebb739465b947454/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L3054-L3056

In the above code, 'oldPartition' is not null but it has a different partition 
value (2024-09-10 00:00:00) than what we got from the INSERT event (2024-09-10 
00%3A00%3A00). It seems the bug is due to we always URL decode the partition 
value even it's from the HMS event:
{code:java}
  private Pair<String, LiteralExpr> getPartitionExprFromValue(
      String partValue, Type type) {
    LiteralExpr expr;
    // URL decode the partition value since it may contain encoded URL.
    String value = FileUtils.unescapePathName(partValue);{code}
[https://github.com/apache/impala/blob/5371e0c6df3e329398712af3ebb739465b947454/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2719]

So "2024-09-10 00%3A00%3A00" in the INSERT event becomes "2024-09-10 00:00:00" 
and happens to match another partition.

> Processing INSERT event failed by partition values mismatch
> -----------------------------------------------------------
>
>                 Key: IMPALA-13691
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13691
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Create a partitioned table:
> {code:sql}
> create external table test_part (i int) partitioned by (s string);{code}
> Add the following partition folders inside the table location:
> {code:bash}
> TBL_DIR=hdfs://localhost:20500/test-warehouse/test_part
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%25253A00%25253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%25253A00%25253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-21 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-21 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-22 00%3A00%3A00"{code}
> In Impala, create the partitions by ALTER TABLE RECOVER PARTITIONS:
> {code:sql}
> impala> alter table test_part recover partitions;{code}
> The partition values are inconsistent with the partition folders:
> {noformat}
> Query: show partitions test_part
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+
> | s                           | #Rows | #Files | Size   | Bytes Cached | 
> Cache Replication | Format  | Incremental stats | Location                    
>                                                       | EC Policy |
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+
> | 2024-09-09 00%253A00%253A00 | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09 
> 00%25253A00%25253A00 | NONE      |
> | 2024-09-09 00%3A00%3A00     | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09 00%253A00%253A00 
>     | NONE      |
> | 2024-09-09 00:00:00         | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09 00%3A00%3A00     
>     | NONE      |
> | 2024-09-10 00%253A00%253A00 | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10 
> 00%25253A00%25253A00 | NONE      |
> | 2024-09-10 00%3A00%3A00     | -1    | 4      | 1.70KB | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10 00%253A00%253A00 
>     | NONE      |
> | 2024-09-10 00:00:00         | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10 00%3A00%3A00     
>     | NONE      |
> | 2025-01-21 00%3A00%3A00     | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-21 00%253A00%253A00 
>     | NONE      |
> | 2025-01-21 00:00:00         | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-21 00%3A00%3A00     
>     | NONE      |
> | 2025-01-22 00:00:00         | -1    | 0      | 0B     | NOT CACHED   | NOT 
> CACHED        | PARQUET | false             | 
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-22 00%3A00%3A00     
>     | NONE      |
> | Total                       | -1    | 4      | 1.70KB | 0B           |      
>              |         |                   |                                  
>                                                  |           |
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+{noformat}
> INSERT one partition in Hive:
> {code:sql}
> hive> insert into test_part partition(s="2024-09-10 00%3A00%3A00") values 
> (0);{code}
> The EventProcessor in catalogd failed to process the INSERT event:
> {noformat}
> E0124 12:37:52.303791 1926240 MetastoreEventsProcessor.java:1098] Unexpected 
> exception received while processing event
> Java exception follows:
> java.lang.IllegalArgumentException
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)
>         at 
> org.apache.impala.catalog.HdfsTable.reloadPartitions(HdfsTable.java:3054)
>         at 
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2946)
>         at 
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsFromNamesIfExists(CatalogOpExecutor.java:5092)
>         at 
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:5021)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:1112)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1671)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processTableEvent(MetastoreEvents.java:1653)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1339)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:701)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1336)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1079)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
> E0124 12:37:52.306558 1926240 MetastoreEventsProcessor.java:1436] Event id: 
> 38879
> Event Type: INSERT
> Event time: 1737693396
> Database name: default
> Table name: test_part
> Event message: 
> H4sIAAAAAAAAAO1WW2+bMBj9KxPV3ggxJoUQaQ9pS7VM3VqlTJu0TJELTvHkYGqbTl2V/z5fSBsovajawx6ah9h85/D5u/mIW0dgfo25M3FkwclKToZDyjJECybkJAbjwHENhWT4jJMyIxWiiqys+YVac7xCNZXqUaILirUbLOSyQvzOlt5U2p58T5P5l+nJMp0enCRb8PTi1yfBSoXfLhx/4UzUIiRXm8W9p4WzcRcObKPNyRYL2thVjUrKyksLjixIAu3Bj4IojAM/ijW0vwsBbQkfWCJr4TizmyZKKtTZkx8N4PruwwRIb+Ck1EFvfvZARb4SrQZAsA/AUBdi8BtxXLBa4GGnLp3cGb/0UIWyAnsFyhmrvIJcY++KeoR56q2rGkvvM6o4zs/s06ysannM+BrJVsFe7/G0lh2XTaHl6uV1hq+IQk1qjr0mio8KP8f8CLfqtEaV7Ztx7G5X4N6qlmw0cdxpcEMwDYt7m28xH7zgBM3zn5mo3QNhBzObButmZLGejHzYKn9vlo+PsegdY7USc8M2u4V5JPdAu90qgHk9nX9NDEFyVAqCS7mkSMijnKZkjQ3l/qoa4unBp8Pp2fRgdjJLZ8m5oSiX82R65Kr123yWKo9NiTvBtsXH5uM/mEk/6lxHNUANd9zSEGNqMibhaPs+DHdNMBzDzUYJXSXLPpnrtNvXDgAcDUA88ME7AN4H0+Zv4fSJxVMC2JGIXgWM4ZMK+B/q3VB8aFcI7psa2eVNDd/U8NVqGD0Z7EgP+7NCBcehVTQmET0nfyw4joCxlvX6mFAsEo5EzfEhy3FuCG3YmOBWx+JHBQnsSs3AN0IjVUBConV1d8mDOHRVZSuKMv0NtkJUYEVc6ZNUnv/2Ag6B+S3BMmPVzTLY29tTwvUXNRWTVGIKAAA=
> W0124 12:37:52.306648 1926240 MetastoreEventsProcessor.java:1067] Event 
> processing is skipped since status is ERROR. Last synced event id is 
> 38878{noformat}
> Note that to reproduce the issue after IMPALA-12832, you need to launch 
> catalogd with "--invalidate_metadata_on_event_processing_failure=false".
> CC [~hemanth619], [~VenuReddy]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to