[
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhu Joseph updated HUDI-7354:
--------------------------------
Description:
Flink Batch Read from Hudi table does not return any rows. The same flink sql
script returns 8 rows as expected on 0.14.0 Hudi version.
*Repro Steps*
1. Flink 1.18.1 and Hudi 0.14.0
2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D
state.checkpoint-storage=filesystem -D
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}
3. Place CSV Input Data
{code}
cat > data <<EOF
1,Danny,23
2,Stephen,33
3,Julian,53
4,Fabian,31
5,Sophia,18
6,Emma,20
7,Bob,44
8,Han,56
EOF
hadoop fs -mkdir -p
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
hadoop fs -put data
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
{code}
4. Run attached Flink sql (flink-hudi.sql) script
{code}
/usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
{code}
The script makes a flink filesystem table with CSV data of 8 rows. Then, it
forms a Hudi table and puts in the data from the filesystem table. Finally, it
runs a select query from the Hudi table. The select query does not return any
data.
5. Cleanup the tables and databases using cleanup.sql
*Analysis*
The select query and insert query run together. The select query ends quickly
since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits
until the data loads and then retrieves it.
was:
Flink Batch Read from Hudi table does not return any rows. The same flink sql
script returns 8 rows as expected on 0.14.0 Hudi version.
*Repro Steps*
1. Flink 1.18.1 and Hudi 0.14.0
2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D
state.checkpoint-storage=filesystem -D
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}
3. Place CSV Input Data
{code}
cat > data <<EOF
1,Danny,23
2,Stephen,33
3,Julian,53
4,Fabian,31
5,Sophia,18
6,Emma,20
7,Bob,44
8,Han,56
EOF
hadoop fs -mkdir -p
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
hadoop fs -put data
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
{code}
4. Run attached Flink sql script
{code}
/usr/lib/flink/bin/sql-client.sh -f flink-hudi-hive.sql
{code}
The script makes a flink filesystem table with CSV data of 8 rows. Then, it
forms a Hudi table and puts in the data from the filesystem table. Finally, it
runs a select query from the Hudi table. The select query does not return any
data.
*Analysis*
The select query and insert query run together. The select query ends quickly
since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits
until the data loads and then retrieves it.
> Flink Batch Read from Hudi table does not return any rows
> ---------------------------------------------------------
>
> Key: HUDI-7354
> URL: https://issues.apache.org/jira/browse/HUDI-7354
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink-sql
> Affects Versions: 0.14.1
> Reporter: Prabhu Joseph
> Priority: Major
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
> 1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D
> state.checkpoint-storage=filesystem -D
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data <<EOF
> 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it
> forms a Hudi table and puts in the data from the filesystem table. Finally,
> it runs a select query from the Hudi table. The select query does not return
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits
> until the data loads and then retrieves it.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)