[
https://issues.apache.org/jira/browse/HUDI-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-1608:
--------------------------------------
Summary: MOR w/ fetches all records for read optimized query (was: MOR w/
global bloom fetches all records for read optimized query)
> MOR w/ fetches all records for read optimized query
> ---------------------------------------------------
>
> Key: HUDI-1608
> URL: https://issues.apache.org/jira/browse/HUDI-1608
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Affects Versions: 0.7.0
> Reporter: sivabalan narayanan
> Priority: Major
> Labels: sev:critical, user-support-issues
>
> Script to reproduce in local spark:
> [https://gist.github.com/nsivabalan/7250b794788516f1aec35650c2632364]
> ```
> scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key,
> _hoodie_partition_path, id, __op from hudi_trips_snapshot order by
> _hoodie_record_key").show(false)
> +-------------------+------------------+----------------------+---+----+
> |_hoodie_commit_time|_hoodie_record_key|_hoodie_partition_path|id |__op|
> +-------------------+------------------+----------------------+---+----+
> |20210210065058 |1 |1970-01-01 |1 |null|
> |20210210065127 |2 |2020-01-04 |2 |D |
> |20210210065127 |3 |1970-01-01 |3 |D |
> |20210210065127 |4 |2020-01-01 |4 |U |
> |20210210065058 |5 |2020-01-01 |5 |I |
> |20210210065127 |6 |1998-04-13 |6 |I |
> +-------------------+------------------+----------------------+---+----+
> ```
> After an upsert, read optimized query returns records from both C1 and C2.
> Also, I don't find any log files in partitions. all of them are parquet
> files.
>
> ls /tmp/hudi_trips_cow/1998-04-13/
> 0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-23-12025_20210210065058.parquet
> 0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-61-25595_20210210065127.parquet
> ls /tmp/hudi_trips_cow/1970-01-01/
> 7b836833-a656-485d-967a-871bdc653dc3-0_2-61-25596_20210210065127.parquet
> 7b836833-a656-485d-967a-871bdc653dc3-0_3-23-12027_20210210065058.parquet
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)