[
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhu Joseph updated HUDI-7354:
--------------------------------
Attachment: cleanup.sql
> Flink Batch Read from Hudi table does not return any rows
> ---------------------------------------------------------
>
> Key: HUDI-7354
> URL: https://issues.apache.org/jira/browse/HUDI-7354
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink-sql
> Affects Versions: 0.14.1
> Reporter: Prabhu Joseph
> Priority: Major
> Attachments: cleanup.sql, flink-hudi.sql
>
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
> 1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D
> state.checkpoint-storage=filesystem -D
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data <<EOF
> 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it
> forms a Hudi table and puts in the data from the filesystem table. Finally,
> it runs a select query from the Hudi table. The select query does not return
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits
> until the data loads and then retrieves it.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)