codope opened a new pull request, #8342:
URL: https://github.com/apache/hudi/pull/8342
### Change Logs
Clustering on a bootstrap table (`METADATA_ONLY` bootstrap mode) with row
writer disabled did not show correct results. Only meta-fields were populated,
while data columns were null. This PR fixes the bug. It adds a separate
`HoodieBootstrapFileReader` that stitches the meta columns with the data
columns.
Before this fix, snapshot query after clustering on bootstrap table:
```
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|timestamp
|_row_key|partition_path|rider |driver |begin_lat |begin_lon
|end_lat |end_lon |fare
|tip_history |_hoodie_is_deleted|datestr |
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
|00000000000001 |00000000000001_5_0 |trip_0 |datestr=2018
|a80c61fe-89a1-488c-8daa-dac6cea52dfb_5-10-37_00000000000001.parquet
|1680265327825|trip_0 |1680265327825 |rider_0 |driver_0 |0.2909073141582583
|0.6713659942455674 |0.3199873855402988 |0.8901008450132192
|[13.409427386679251, USD]|[[76.98430157746769, USD]] |false
|datestr=2018|
|00000000000001 |00000000000001_7_0 |trip_3 |datestr=2018
|45b13f98-496e-4107-8c66-5ce88ab69940_7-10-39_00000000000001.parquet
|1680265327825|trip_3 |1680265327825 |rider_3 |driver_3 |0.13139874521266626
|0.9288890012418678 |0.19960441648570804 |0.028970072867536834
|[3.934944937321838, USD] |[[60.94692580064911, USD]] |false
|datestr=2018|
|00000000000001 |00000000000001_7_1 |trip_4 |datestr=2018
|45b13f98-496e-4107-8c66-5ce88ab69940_7-10-39_00000000000001.parquet
|1680265327825|trip_4 |1680265327825 |rider_4 |driver_4 |0.19148119051373647
|0.3121563466437075 |0.07312220393022284 |0.4623498809657779
|[84.27465303833377, USD] |[[48.54971480008592, USD]] |false
|datestr=2018|
|00000000000001 |00000000000001_6_0 |trip_2 |datestr=2018
|4b6f5614-cfe9-42cd-bd0c-09667714a6a3_6-10-38_00000000000001.parquet
|1680265327825|trip_2 |1680265327825 |rider_2 |driver_2 |0.29293250471488286
|0.8169497077647824 |0.4575395537485407 |0.37034912499009554
|[65.48417669107184, USD] |[[51.323010501226705, USD]]|false
|datestr=2018|
|00000000000001 |00000000000001_4_0 |trip_1 |datestr=2018
|2f656a57-d3d3-453b-bea3-beb3f86a2cfc_4-10-36_00000000000001.parquet
|1680265327825|trip_1 |1680265327825 |rider_1 |driver_1 |0.7593035032651309
|0.4695942868315275 |0.04062310794619961 |0.7483312940246941
|[99.53761667379452, USD] |[[36.68130227843157, USD]] |false
|datestr=2018|
|00000000000001 |00000000000001_8_0 |trip_6 |datestr=2019
|0e43fd89-9294-4630-8f7b-b782f15377b8_8-10-40_00000000000001.parquet
|1680265327825|trip_6 |1680265327825 |rider_6 |driver_6 |0.6576893480206276
|0.20124822123740116|0.5587907101480606 |0.0087676912597352
|[46.3596114051868, USD] |[[1.4482069738172454, USD]]|false
|datestr=2019|
|00000000000001 |00000000000001_9_0 |trip_5 |datestr=2019
|1260bd0a-e1b0-469e-9407-c0952a2e5bce_9-10-41_00000000000001.parquet
|1680265327825|trip_5 |1680265327825 |rider_5 |driver_5 |0.8780482394034513
|0.45016664520520033|0.1210946590521833 |0.559346262842122
|[3.980544730087332, USD] |[[11.81867856830614, USD]] |false
|datestr=2019|
|00000000000001 |00000000000001_10_0 |trip_7 |datestr=2019
|ca2423ad-40f9-437a-a009-bf5b14cedb34_10-10-42_00000000000001.parquet|1680265327825|trip_7
|1680265327825 |rider_7 |driver_7 |0.8539282876074638 |0.6288419331027626
|0.1199959028048404 |0.19234888544292428 |[17.28229998461128, USD]
|[[77.49172321783067, USD]] |false |datestr=2019|
|00000000000001 |00000000000001_11_0 |trip_8 |datestr=2019
|11502732-a705-4f63-9b8e-3ace93d8c9f4_11-10-43_00000000000001.parquet|1680265327825|trip_8
|1680265327825 |rider_8 |driver_8 |0.5247015895548016
|0.09543754441513863|0.1510348079622863 |0.3036501516600335
|[18.50748211199097, USD] |[[80.26618263126355, USD]] |false
|datestr=2019|
|00000000000001 |00000000000001_11_1 |trip_9 |datestr=2019
|11502732-a705-4f63-9b8e-3ace93d8c9f4_11-10-43_00000000000001.parquet|1680265327825|trip_9
|1680265327825 |rider_9 |driver_9 |0.18732285899232892 |0.419057912375039
|0.9402509062992255 |0.7540875540699798 |[77.90400106882183, USD]
|[[89.12865661547804, USD]] |false |datestr=2019|
|00000000000001 |00000000000001_3_0 |trip_10 |datestr=2020
|75960f03-0093-438c-b71c-fe5eb02496e4_3-10-35_00000000000001.parquet
|1680265327825|trip_10 |1680265327825 |rider_10|driver_10|0.7945595842585961
|0.849250587072739 |0.8016352053998793 |0.6664019129654204
|[68.54476863463951, USD] |[[78.73973533402236, USD]] |false
|datestr=2020|
|00000000000001 |00000000000001_0_0 |trip_12 |datestr=2020
|920c7f2e-0cc9-46b6-8780-3e3312ef133c_0-10-32_00000000000001.parquet
|1680265327825|trip_12 |1680265327825 |rider_12|driver_12|0.26359097652813546
|0.3040963404277949 |0.783608220421833 |0.26773327561669813
|[8.899266098961778, USD] |[[63.19151746906088, USD]] |false
|datestr=2020|
|00000000000001 |00000000000001_1_0 |trip_13 |datestr=2020
|3cc87619-d56d-4a6d-9023-8af97824bfac_1-10-33_00000000000001.parquet
|1680265327825|trip_13 |1680265327825
|rider_13|driver_13|0.037809287288638194|0.20234037038861052|0.7404294591470656
|0.29316985501104065 |[93.45037833211967, USD] |[[50.56012365982448, USD]]
|false |datestr=2020|
|00000000000001 |00000000000001_1_1 |trip_14 |datestr=2020
|3cc87619-d56d-4a6d-9023-8af97824bfac_1-10-33_00000000000001.parquet
|1680265327825|trip_14 |1680265327825 |rider_14|driver_14|0.7519002026514892
|0.9448162986968871 |0.40054933992868635
|0.0038455626793925113|[15.880759811433354, USD]|[[84.44445639423378, USD]]
|false |datestr=2020|
|00000000000001 |00000000000001_2_0 |trip_11 |datestr=2020
|50b2f640-e284-49ef-a1f8-f5a819a0e7be_2-10-34_00000000000001.parquet
|1680265327825|trip_11 |1680265327825 |rider_11|driver_11|0.23032054239540056
|0.9100367991551281 |0.022237439482133525|0.08921895796973023
|[68.27062120012675, USD] |[[39.13358730683697, USD]] |false
|datestr=2020|
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
```
After this fix:
```
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|timestamp
|_row_key|partition_path|rider |driver |begin_lat |begin_lon
|end_lat |end_lon |fare
|tip_history |_hoodie_is_deleted|datestr |
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
|00000000000001 |00000000000001_4_0 |trip_3 |datestr=2018
|bf676b9f-2e9b-48fd-a6a1-26d8139aecd0_4-10-36_00000000000001.parquet
|1680265436528|trip_3 |1680265436528 |rider_3 |driver_3 |0.5082028544317309
|0.6186035619925132
|0.019487324652589844|0.34244473537926823|[92.41566255606341, USD]
|[[67.8016115074926, USD]] |false |datestr=2018|
|00000000000001 |00000000000001_4_1 |trip_4 |datestr=2018
|bf676b9f-2e9b-48fd-a6a1-26d8139aecd0_4-10-36_00000000000001.parquet
|1680265436528|trip_4 |1680265436528 |rider_4 |driver_4 |0.5182280625084768
|0.9253109379737152 |0.33233798005862314 |0.7110019996809055
|[4.44409622575439, USD] |[[33.869898194219516, USD]]|false
|datestr=2018|
|00000000000001 |00000000000001_6_0 |trip_0 |datestr=2018
|0f4f79d1-c011-4bc9-9f89-8a4a2197ef56_6-10-38_00000000000001.parquet
|1680265436528|trip_0 |1680265436528 |rider_0 |driver_0 |0.12176296539745046
|0.382558364451396 |0.0870559794514496
|0.27640429152343515|[92.1024811423022, USD] |[[67.77835365292796, USD]]
|false |datestr=2018|
|00000000000001 |00000000000001_5_0 |trip_2 |datestr=2018
|dec41ce1-9fe8-4cf2-99ab-ffa25297a2da_5-10-37_00000000000001.parquet
|1680265436528|trip_2 |1680265436528 |rider_2 |driver_2 |0.5522660335262106
|0.7589583434997402 |0.6039198595852253 |0.8361083230362024
|[78.0609254553147, USD] |[[27.858948192411514, USD]]|false
|datestr=2018|
|00000000000001 |00000000000001_7_0 |trip_1 |datestr=2018
|df6c3238-a4f2-4796-b582-2e427c0e1dcd_7-10-39_00000000000001.parquet
|1680265436528|trip_1 |1680265436528 |rider_1 |driver_1 |0.7389516331004687
|0.28811408775028 |0.7200780424137405 |0.484662130326595
|[1.6077573601573025, USD]|[[10.341913607318709, USD]]|false
|datestr=2018|
|00000000000001 |00000000000001_8_0 |trip_5 |datestr=2019
|37ffa376-a86d-4706-a785-2711fe13aa78_8-10-40_00000000000001.parquet
|1680265436528|trip_5 |1680265436528 |rider_5 |driver_5 |0.1630151212353752
|0.27057428081894186|0.3808059886411259 |0.3692283742910598
|[31.179184715024654, USD]|[[93.96021299492908, USD]] |false
|datestr=2019|
|00000000000001 |00000000000001_9_0 |trip_6 |datestr=2019
|4b1783be-5cae-4ef9-9030-60cbce595531_9-10-41_00000000000001.parquet
|1680265436528|trip_6 |1680265436528 |rider_6 |driver_6 |0.5420218856799521
|0.3717532476763643 |0.7316585090626965 |0.5182677308446296
|[49.210873427144186, USD]|[[2.034155984429642, USD]] |false
|datestr=2019|
|00000000000001 |00000000000001_10_0 |trip_8 |datestr=2019
|8af5538e-bbe9-4291-8814-bc5e356f90dc_10-10-42_00000000000001.parquet|1680265436528|trip_8
|1680265436528 |rider_8 |driver_8 |0.8253202558194069 |0.8769063071666001
|0.9978855323416493 |0.07003530632543731|[22.31002279951365, USD]
|[[30.365612077091576, USD]]|false |datestr=2019|
|00000000000001 |00000000000001_10_1 |trip_9 |datestr=2019
|8af5538e-bbe9-4291-8814-bc5e356f90dc_10-10-42_00000000000001.parquet|1680265436528|trip_9
|1680265436528 |rider_9 |driver_9 |0.31560399915225323 |0.496779058144757
|0.6974261081429741 |0.9073312408362796 |[87.04727640702991, USD]
|[[96.17579621323826, USD]] |false |datestr=2019|
|00000000000001 |00000000000001_11_0 |trip_7 |datestr=2019
|eea5a961-af77-4834-9abf-73cc5bf20eff_11-10-43_00000000000001.parquet|1680265436528|trip_7
|1680265436528 |rider_7 |driver_7 |0.08038761693792418 |0.632904243467236
|0.555660576167659 |0.4872642442124486 |[13.555441426862014,
USD]|[[10.544626239374132, USD]]|false |datestr=2019|
|00000000000001 |00000000000001_3_0 |trip_12 |datestr=2020
|1cc534c7-1e6f-4412-bd6c-c1855070974a_3-10-35_00000000000001.parquet
|1680265436528|trip_12 |1680265436528
|rider_12|driver_12|0.004079088549327037|0.16874021976709552|0.20828594874323636
|0.895462473317559 |[92.18052838420539, USD] |[[44.650703399553215,
USD]]|false |datestr=2020|
|00000000000001 |00000000000001_1_0 |trip_11 |datestr=2020
|ffc5ecc7-46e9-4fe4-a6ec-7adbf1e31e33_1-10-33_00000000000001.parquet
|1680265436528|trip_11 |1680265436528 |rider_11|driver_11|0.16583914122830068
|0.28708446826172784|0.6707401823203576
|0.20113024584157368|[13.875727591686381, USD]|[[52.648852351275025,
USD]]|false |datestr=2020|
|00000000000001 |00000000000001_0_0 |trip_10 |datestr=2020
|09f09c46-8d3a-420d-a7d5-435c6280b161_0-10-32_00000000000001.parquet
|1680265436528|trip_10 |1680265436528 |rider_10|driver_10|0.7531144860222685
|0.9217065388363564 |0.12736143989601045 |0.6846542499163221
|[85.46301785894622, USD] |[[67.94440570686055, USD]] |false
|datestr=2020|
|00000000000001 |00000000000001_2_0 |trip_13 |datestr=2020
|bb0e66b1-a853-4075-a9ce-f5150d1db17e_2-10-34_00000000000001.parquet
|1680265436528|trip_13 |1680265436528 |rider_13|driver_13|0.37536133167833274
|0.13380768426991696|0.7165151686625107 |0.4484507140549401
|[37.18742431963579, USD] |[[29.610616003915634, USD]]|false
|datestr=2020|
|00000000000001 |00000000000001_2_1 |trip_14 |datestr=2020
|bb0e66b1-a853-4075-a9ce-f5150d1db17e_2-10-34_00000000000001.parquet
|1680265436528|trip_14 |1680265436528 |rider_14|driver_14|0.5579756297430776
|0.39976488479239436|0.4722872205073937
|0.10655015779417953|[95.73825510010874, USD] |[[94.31603336355222, USD]]
|false |datestr=2020|
+-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
```
### Impact
A bug fix for bootstrap tables.
### Risk level (write none, low medium or high below)
low
Only when the base file has a bootstrap path in clustering then only the
`HoodieBootstrapFileReader` will be used.
### Documentation Update
_Describe any necessary documentation update if there is any new feature,
config, or user-facing change_
- _The config description must be updated if new configs are added or the
default value of the configs are changed_
- _Any new feature or user-facing change requires updating the Hudi website.
Please create a Jira ticket, attach the
ticket number here and follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to
make
changes to the website._
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]