[
https://issues.apache.org/jira/browse/HUDI-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903474#comment-17903474
]
Lin Liu commented on HUDI-8648:
-------------------------------
This test may have exposed two issues:
# Delete operation may not work well, such that the records with the specified
keys are not removed. In this case, SI is not removed correctly.
# The SI itself has some issues that causes there are multiple entries for the
same primary key.
What I did to rerun the tests, and kept the table which reports the error, and
run some analysis.
Evidence 1:
{code:java}
+-------------------+----------------------+------------------------------------+----------------------+------------------------------------------------------------------------+------------------+------------------------------------+--------------------+-------------------+--------+----------+--------------------+-------------------+------------------+----------+--------------+---------+---------+---------+------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path|_hoodie_file_name
|_hoodie_is_deleted|_row_key
|begin_lat |begin_lon |currency|driver |end_lat
|end_lon |fare |partition |partition_path|rider
|timestamp|trip_type|stage |
+-------------------+----------------------+------------------------------------+----------------------+------------------------------------------------------------------------+------------------+------------------------------------+--------------------+-------------------+--------+----------+--------------------+-------------------+------------------+----------+--------------+---------+---------+---------+------------+
|20241205080329800 |20241205080329800_0_3
|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15
|f6d65e8e-8468-4dcb-bb00-0d234419a462-0_0-17-47_20241205080329800.parquet|false
|08402769-0bb0-484d-8966-82b4d4198c4d|0.3382497975681562
|0.961985826451825 |USD |driver-002|0.6241832400570156
|0.08846099424392817|32.023973976968314|2016/03/15|2016/03/15 |rider-002|0
|BLACK |after_insert|
+----------------------------------------------+------------+
|key |after_insert|
+----------------------------------------------+------------+
|rider-002$08402769-0bb0-484d-8966-82b4d4198c4d|after_insert|
|rider-002$61bfda59-7438-450e-aa6f-525deb182edf|after_insert|
|rider-002$fde65748-6445-4efe-92d5-0b984be3ea1c|after_insert|
|rider-002$ec76535b-d07b-4e3d-bea3-803d51de159e|after_insert|
|rider-002$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_insert|
+----------------------------------------------+------------+
=================================================================================
|20241205080340852
|20241205080340852_0_22|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15
|f6d65e8e-8468-4dcb-bb00-0d234419a462-0|false
|08402769-0bb0-484d-8966-82b4d4198c4d|0.9397066577561988 |0.6170805618950493
|USD |driver-003|0.08486621839537178
|0.04367595838336369|31.532008376488296|2016/03/15|2016/03/15 |rider-003|0
|UBERX |after_update1|
+----------------------------------------------+-------------+
|key |after_update1|
+----------------------------------------------+-------------+
|rider-003$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update1|
|rider-003$61bfda59-7438-450e-aa6f-525deb182edf|after_update1|
|rider-003$08402769-0bb0-484d-8966-82b4d4198c4d|after_update1|
|rider-003$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update1|
|rider-003$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update1|
+----------------------------------------------+-------------+
=================================================================================
|20241205080408257 |20241205080408257_0_99
|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15
|78a46571-45f7-4099-a73f-4abd50c3b99b-0_2-574-2638_20241205080410549.parquet|false
|08402769-0bb0-484d-8966-82b4d4198c4d|0.3809986196082633
|0.04897870472022914 |USD |driver-004|0.06203977575913755|0.888464125532037
|38.84660087895085 |2016/03/15|2016/03/15 |rider-004|0 |UBERX
|after_update2|
+----------------------------------------------+-------------+
|key |after_update2|
+----------------------------------------------+-------------+
|rider-004$61bfda59-7438-450e-aa6f-525deb182edf|after_update2|
|rider-004$08402769-0bb0-484d-8966-82b4d4198c4d|after_update2|
|rider-004$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update2|
|rider-004$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update2|
|rider-004$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update2|
+----------------------------------------------+-------------+
=================================================================================
|20241205080439421
|20241205080439421_0_132|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15
|78a46571-45f7-4099-a73f-4abd50c3b99b-0|false
|08402769-0bb0-484d-8966-82b4d4198c4d|0.055057743379837376|0.7801762703617553
|USD |driver-005|0.11169136641284616|0.6727625196189504 |22.00554827947049
|2016/03/15|2016/03/15 |rider-005|0 |UBERX |after_update3|
+----------------------------------------------+-------------+
|key |after_update3|
+----------------------------------------------+-------------+
|rider-005$61bfda59-7438-450e-aa6f-525deb182edf|after_update3|
|rider-005$08402769-0bb0-484d-8966-82b4d4198c4d|after_update3|
|rider-005$ec76535b-d07b-4e3d-bea3-803d51de159e|after_update3|
|rider-005$fde65748-6445-4efe-92d5-0b984be3ea1c|after_update3|
|rider-005$1413ef07-52fa-47cd-8b9c-1d10f6b57752|after_update3|
+----------------------------------------------+-------------+
=================================================================================
|20241205080439421
|20241205080439421_0_132|08402769-0bb0-484d-8966-82b4d4198c4d|2016/03/15
|78a46571-45f7-4099-a73f-4abd50c3b99b-0|false
|08402769-0bb0-484d-8966-82b4d4198c4d|0.055057743379837376|0.7801762703617553
|USD |driver-005|0.11169136641284616|0.6727625196189504 |22.00554827947049
|2016/03/15|2016/03/15 |rider-005|0 |UBERX |after_delete|
+----------------------------------------------+------------+
|key |after_delete|
+----------------------------------------------+------------+
|rider-005$08402769-0bb0-484d-8966-82b4d4198c4d|after_delete|
+----------------------------------------------+------------+
We can see that the data is not removed from the table; the SI is not removed
of course.
{code}
Evidence 2:
{code:java}
scala> df.filter("_row_key =
'b792ad9a-b378-42d4-8677-cf081951bdfc'").show(false)
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|_hoodie_is_deleted|_row_key|begin_lat|begin_lon|currency|driver|end_lat|end_lon|fare|partition|partition_path|rider|timestamp|trip_type|
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+------------------+--------+---------+---------+--------+------+-------+-------+----+---------+--------------+-----+---------+---------+
scala> sidf.filter("contains(key,
'b792ad9a-b378-42d4-8677-cf081951bdfc')").show(false)
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+
|key
|type|filesystemMetadata|BloomFilterMetadata|ColumnStatsMetadata|recordIndexMetadata|SecondaryIndexMetadata|
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|7 |null |null
|null |null |{false} |
+----------------------------------------------+----+------------------+-------------------+-------------------+-------------------+----------------------+
+----------------------------------------------+-------------+
|key |after_update2|
+----------------------------------------------+-------------+
|rider-004$d853cd23-bdcf-4f48-857b-efa8e7b37537|after_update2|
|rider-004$51c7db5f-6d9f-420c-b3bb-affd391a8e53|after_update2|
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update2|
|rider-004$2a220e81-3d6c-4877-a11d-7318352430aa|after_update2|
|rider-004$eef4e0ef-9630-4b2d-9e87-3c058d96878b|after_update2|
+----------------------------------------------+-------------+
clustering <---- My guess is: clustering produce another index entry without
clean the previously one. let us prove this!
update3
+----------------------------------------------+-------------+
|key |after_update3|
+----------------------------------------------+-------------+
|rider-005$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update3|
|rider-004$b792ad9a-b378-42d4-8677-cf081951bdfc|after_update3|
<<------------------------ Not removed.
|rider-005$eef4e0ef-9630-4b2d-9e87-3c058d96878b|after_update3|
|rider-005$d853cd23-bdcf-4f48-857b-efa8e7b37537|after_update3|
|rider-005$51c7db5f-6d9f-420c-b3bb-affd391a8e53|after_update3|
|rider-005$2a220e81-3d6c-4877-a11d-7318352430aa|after_update3|
+----------------------------------------------+-------------+
Since there is a clustreing happening after update3, let us check their order.
-rw-r--r--@ 1 linliu staff 2561 Dec 5 09:13
20241205171301747.deltacommit.inflight
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171301747.deltacommit.requested
-rw-r--r--@ 1 linliu staff 3905 Dec 5 09:13
20241205171301747_20241205171307635.deltacommit
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171308218.indexing.inflight
-rw-r--r--@ 1 linliu staff 828 Dec 5 09:13
20241205171308218.indexing.requested
-rw-r--r--@ 1 linliu staff 1148 Dec 5 09:13
20241205171308218_20241205171310354.indexing
-rw-r--r--@ 1 linliu staff 2834 Dec 5 09:13
20241205171315938.deltacommit.inflight
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171315938.deltacommit.requested
-rw-r--r--@ 1 linliu staff 4181 Dec 5 09:13
20241205171315938_20241205171317797.deltacommit
-rw-r--r--@ 1 linliu staff 2834 Dec 5 09:13
20241205171344951.deltacommit.inflight
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171344951.deltacommit.requested
-rw-r--r--@ 1 linliu staff 4187 Dec 5 09:13
20241205171344951_20241205171346135.deltacommit
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171346227.compaction.inflight
-rw-r--r--@ 1 linliu staff 3401 Dec 5 09:13
20241205171346227.compaction.requested
-rw-r--r--@ 1 linliu staff 3954 Dec 5 09:13
20241205171346227_20241205171346943.commit
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:13
20241205171347019.clustering.inflight
-rw-r--r--@ 1 linliu staff 3785 Dec 5 09:13
20241205171347019.clustering.requested
-rw-r--r--@ 1 linliu staff 4257 Dec 5 09:13
20241205171347019_20241205171348097.replacecommit
-rw-r--r--@ 1 linliu staff 2834 Dec 5 09:14
20241205171413223.deltacommit.inflight
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:14
20241205171413223.deltacommit.requested
-rw-r--r--@ 1 linliu staff 4193 Dec 5 09:14
20241205171413223_20241205171414466.deltacommit
-rw-r--r--@ 1 linliu staff 2546 Dec 5 09:14
20241205171440060.deltacommit.inflight
-rw-r--r--@ 1 linliu staff 0 Dec 5 09:14
20241205171440060.deltacommit.requested
-rw-r--r--@ 1 linliu staff 3619 Dec 5 09:14
20241205171440060_20241205171441699.deltacommit
{code}
> Flaky test: TestSecondaryIndex. "Secondary Index With Updates Compaction
> Clustering Deletes"
> --------------------------------------------------------------------------------------------
>
> Key: HUDI-8648
> URL: https://issues.apache.org/jira/browse/HUDI-8648
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Lin Liu
> Assignee: Lin Liu
> Priority: Major
>
> The error stack:
>
> {code:java}
> - Test Secondary Index With Updates Compaction Clustering Deletes *** FAILED
> ***
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
> at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
> at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
> at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35)
> at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179)
> at
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.$anonfun$validateSecondaryIndex$1(TestSecondaryIndex.scala:547)
> at
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.$anonfun$validateSecondaryIndex$1$adapted(TestSecondaryIndex.scala:540)
> at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
> at
> org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.validateSecondaryIndex(TestSecondaryIndex.scala:540)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)