[ 
https://issues.apache.org/jira/browse/HUDI-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903474#comment-17903474
 ] 

Lin Liu commented on HUDI-8648:
-------------------------------

This test may have exposed two issues:
 # Delete operation may not work well, such that the records with the specified 
keys are not removed. In this case, SI is not removed correctly.
 # The SI itself has some issues that causes there are multiple entries for the 
same primary key.

What I did to rerun the tests, and kept the table which reports the error, and 
run some analysis.

Evidence 1:
{code:java}
+-------------------+----------------------+------------------------------------+----------------------+------------------------------------------------------------------------+------------------+------------------------------------+--------------------+-------------------+--------+----------+--------------------+-------------------+------------------+----------+--------------+---------+---------+---------+------------+
|_hoodie_commit_time|_hoodie_commit_seqno  |_hoodie_record_key                  
|_hoodie_partition_path|_hoodie_file_name                                       
                |_hoodie_is_deleted|_row_key                            
|begin_lat           |begin_lon          |currency|driver    |end_lat           
  |end_lon            |fare              |partition |partition_path|rider    
|timestamp|trip_type|stage       |
+-------------------+----------------------+------------------------------------+----------------------+------------------------------------------------------------------------+------------------+------------------------------------+--------------------+-------------------+--------+----------+--------------------+-------------------+------------------+----------+--------------+---------+---------+---------+------------+
|20241205080329800  |20241205080329800_0_3 
|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15            
|f6d65e8e-8468-4dcb-bb00-0d234419a462-0_0-17-47_20241205080329800.parquet|false 
            |08402769-0bb0-484d-8966-82b4d4198c4d|0.3382497975681562  
|0.961985826451825  |USD     |driver-002|0.6241832400570156  
|0.08846099424392817|32.023973976968314|2016/03/15|2016/03/15    |rider-002|0   
     |BLACK    |after_insert|

+----------------------------------------------+------------+
|key                                           |after_insert|
+----------------------------------------------+------------+
|rider-002$08402769-0bb0-484d-8966-82b4d4198c4d|after_insert|
|rider-002$61bfda59-7438-450e-aa6f-525deb182edf|after_insert|
|rider-002$fde65748-6445-4efe-92d5-0b984be3ea1c|after_insert|
|rider-002$ec76535b-d07b-4e3d-bea3-803d51de159e|after_insert|
|rider-002$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_insert|
+----------------------------------------------+------------+

=================================================================================

|20241205080340852  
|20241205080340852_0_22|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15         
   |f6d65e8e-8468-4dcb-bb00-0d234419a462-0|false             
|08402769-0bb0-484d-8966-82b4d4198c4d|0.9397066577561988  |0.6170805618950493  
|USD     |driver-003|0.08486621839537178 
|0.04367595838336369|31.532008376488296|2016/03/15|2016/03/15    |rider-003|0   
     |UBERX    |after_update1|

+----------------------------------------------+-------------+
|key                                           |after_update1|
+----------------------------------------------+-------------+
|rider-003$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update1|
|rider-003$61bfda59-7438-450e-aa6f-525deb182edf|after_update1|
|rider-003$08402769-0bb0-484d-8966-82b4d4198c4d|after_update1|
|rider-003$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update1|
|rider-003$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update1|
+----------------------------------------------+-------------+

=================================================================================

|20241205080408257  |20241205080408257_0_99 
|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15            
|78a46571-45f7-4099-a73f-4abd50c3b99b-0_2-574-2638_20241205080410549.parquet|false
             |08402769-0bb0-484d-8966-82b4d4198c4d|0.3809986196082633 
|0.04897870472022914 |USD     |driver-004|0.06203977575913755|0.888464125532037 
 |38.84660087895085 |2016/03/15|2016/03/15    |rider-004|0        |UBERX    
|after_update2|

+----------------------------------------------+-------------+
|key                                           |after_update2|
+----------------------------------------------+-------------+
|rider-004$61bfda59-7438-450e-aa6f-525deb182edf|after_update2|
|rider-004$08402769-0bb0-484d-8966-82b4d4198c4d|after_update2|
|rider-004$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update2|
|rider-004$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update2|
|rider-004$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update2|
+----------------------------------------------+-------------+

=================================================================================

|20241205080439421  
|20241205080439421_0_132|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15        
    |78a46571-45f7-4099-a73f-4abd50c3b99b-0|false             
|08402769-0bb0-484d-8966-82b4d4198c4d|0.055057743379837376|0.7801762703617553 
|USD     |driver-005|0.11169136641284616|0.6727625196189504 |22.00554827947049 
|2016/03/15|2016/03/15    |rider-005|0        |UBERX    |after_update3|

+----------------------------------------------+-------------+
|key                                           |after_update3|
+----------------------------------------------+-------------+
|rider-005$61bfda59-7438-450e-aa6f-525deb182edf|after_update3|
|rider-005$08402769-0bb0-484d-8966-82b4d4198c4d|after_update3|
|rider-005$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update3|
|rider-005$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update3|
|rider-005$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update3|
+----------------------------------------------+-------------+

=================================================================================

|20241205080439421  
|20241205080439421_0_132|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15        
    |78a46571-45f7-4099-a73f-4abd50c3b99b-0|false             
|08402769-0bb0-484d-8966-82b4d4198c4d|0.055057743379837376|0.7801762703617553 
|USD     |driver-005|0.11169136641284616|0.6727625196189504 |22.00554827947049 
|2016/03/15|2016/03/15    |rider-005|0        |UBERX    |after_delete|

+----------------------------------------------+------------+
|key                                           |after_delete|
+----------------------------------------------+------------+
|rider-005$08402769-0bb0-484d-8966-82b4d4198c4d|after_delete|
+----------------------------------------------+------------+

We can see that the data is not removed from the table; the SI is not removed 
of course.
 {code}
Evidence 2:
{code:java}
scala> df.filter("_row_key = 
'b792ad9a-b378-42d4-8677-cf081951bdfc'").show(false)
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|_hoodie_is_deleted|_row_key|begin_lat|begin_lon|currency|driver|end_lat|end_lon|fare|partition|partition_path|rider|timestamp|trip_type|
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
scala> sidf.filter("contains(key, 
'b792ad9a-b378-42d4-8677-cf081951bdfc')").show(false)
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+
|key                                           
|type|filesystemMetadata|BloomFilterMetadata|ColumnStatsMetadata|recordIndexMetadata|SecondaryIndexMetadata|
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|7   |null              |null    
           |null               |null               |{false}               |
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+

+----------------------------------------------+-------------+
|key                                           |after_update2|
+----------------------------------------------+-------------+
|rider-004$d853cd23-bdcf-4f48-857b-efa8e7b37537|after_update2|
|rider-004$51c7db5f-6d9f-420c-b3bb-affd391a8e53|after_update2|
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update2|
|rider-004$2a220e81-3d6c-4877-a11d-7318352430aa|after_update2|
|rider-004$eef4e0ef-9630-4b2d-9e87-3c058d96878b|after_update2|
+----------------------------------------------+-------------+

clustering <---- My guess is: clustering produce another index entry without 
clean the previously one. let us prove this!
update3

+----------------------------------------------+-------------+
|key                                           |after_update3|
+----------------------------------------------+-------------+
|rider-005$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update3|
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update3| 
<<------------------------ Not removed.
|rider-005$eef4e0ef-9630-4b2d-9e87-3c058d96878b|after_update3|
|rider-005$d853cd23-bdcf-4f48-857b-efa8e7b37537|after_update3|
|rider-005$51c7db5f-6d9f-420c-b3bb-affd391a8e53|after_update3|
|rider-005$2a220e81-3d6c-4877-a11d-7318352430aa|after_update3|
+----------------------------------------------+-------------+

Since there is a clustreing happening after update3, let us check their order.

-rw-r--r--@  1 linliu  staff  2561 Dec  5 09:13 
20241205171301747.deltacommit.inflight
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171301747.deltacommit.requested
-rw-r--r--@  1 linliu  staff  3905 Dec  5 09:13 
20241205171301747_20241205171307635.deltacommit
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171308218.indexing.inflight
-rw-r--r--@  1 linliu  staff   828 Dec  5 09:13 
20241205171308218.indexing.requested
-rw-r--r--@  1 linliu  staff  1148 Dec  5 09:13 
20241205171308218_20241205171310354.indexing
-rw-r--r--@  1 linliu  staff  2834 Dec  5 09:13 
20241205171315938.deltacommit.inflight
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171315938.deltacommit.requested
-rw-r--r--@  1 linliu  staff  4181 Dec  5 09:13 
20241205171315938_20241205171317797.deltacommit
-rw-r--r--@  1 linliu  staff  2834 Dec  5 09:13 
20241205171344951.deltacommit.inflight
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171344951.deltacommit.requested
-rw-r--r--@  1 linliu  staff  4187 Dec  5 09:13 
20241205171344951_20241205171346135.deltacommit
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171346227.compaction.inflight
-rw-r--r--@  1 linliu  staff  3401 Dec  5 09:13 
20241205171346227.compaction.requested
-rw-r--r--@  1 linliu  staff  3954 Dec  5 09:13 
20241205171346227_20241205171346943.commit
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:13 
20241205171347019.clustering.inflight
-rw-r--r--@  1 linliu  staff  3785 Dec  5 09:13 
20241205171347019.clustering.requested
-rw-r--r--@  1 linliu  staff  4257 Dec  5 09:13 
20241205171347019_20241205171348097.replacecommit
-rw-r--r--@  1 linliu  staff  2834 Dec  5 09:14 
20241205171413223.deltacommit.inflight
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:14 
20241205171413223.deltacommit.requested
-rw-r--r--@  1 linliu  staff  4193 Dec  5 09:14 
20241205171413223_20241205171414466.deltacommit
-rw-r--r--@  1 linliu  staff  2546 Dec  5 09:14 
20241205171440060.deltacommit.inflight
-rw-r--r--@  1 linliu  staff     0 Dec  5 09:14 
20241205171440060.deltacommit.requested
-rw-r--r--@  1 linliu  staff  3619 Dec  5 09:14 
20241205171440060_20241205171441699.deltacommit
 {code}

> Flaky test: TestSecondaryIndex. "Secondary Index With Updates Compaction 
> Clustering Deletes"
> --------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8648
>                 URL: https://issues.apache.org/jira/browse/HUDI-8648
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Lin Liu
>            Assignee: Lin Liu
>            Priority: Major
>
> The error stack:
>  
> {code:java}
> - Test Secondary Index With Updates Compaction Clustering Deletes *** FAILED 
> ***
>   org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
>   at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35)
>   at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179)
>   at 
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.$anonfun$validateSecondaryIndex$1(TestSecondaryIndex.scala:547)
>   at 
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.$anonfun$validateSecondaryIndex$1$adapted(TestSecondaryIndex.scala:540)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>   at 
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.validateSecondaryIndex(TestSecondaryIndex.scala:540)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to