[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

Prabhu Joseph (Jira) Sat, 27 Jan 2024 23:35:27 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prabhu Joseph updated HUDI-7354:
--------------------------------
    Description: 
Flink Batch Read from Hudi table does not return any rows. The same flink sql 
script returns 8 rows as expected on 0.14.0 Hudi version.


*Repro Steps*

 1. Flink 1.18.1 and Hudi 0.14.0

2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
state.checkpoint-storage=filesystem  -D 
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}

3. Place CSV Input Data
{code}
cat > data <<EOF
1,Danny,23
2,Stephen,33
3,Julian,53
4,Fabian,31
5,Sophia,18
6,Emma,20
7,Bob,44
8,Han,56
EOF

hadoop fs -mkdir -p 
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
hadoop fs -put data 
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/

{code}

4. Run attached Flink sql (flink-hudi.sql) script
{code}
/usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
{code}


The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
forms a Hudi table and puts in the data from the filesystem table. Finally, it 
runs a select query from the Hudi table. The select query does not return any 
data.

5. Cleanup the tables and databases using cleanup.sql


*Analysis*

The select query and insert query run together. The select query ends quickly 
since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
until the data loads and then retrieves it.







 

  was:
Flink Batch Read from Hudi table does not return any rows. The same flink sql 
script returns 8 rows as expected on 0.14.0 Hudi version.


*Repro Steps*

 1. Flink 1.18.1 and Hudi 0.14.0

2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
state.checkpoint-storage=filesystem  -D 
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}

3. Place CSV Input Data
{code}
cat > data <<EOF
1,Danny,23
2,Stephen,33
3,Julian,53
4,Fabian,31
5,Sophia,18
6,Emma,20
7,Bob,44
8,Han,56
EOF

hadoop fs -mkdir -p 
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
hadoop fs -put data 
s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/

{code}

4. Run attached Flink sql script
{code}
/usr/lib/flink/bin/sql-client.sh -f flink-hudi-hive.sql
{code}


The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
forms a Hudi table and puts in the data from the filesystem table. Finally, it 
runs a select query from the Hudi table. The select query does not return any 
data.


*Analysis*

The select query and insert query run together. The select query ends quickly 
since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
until the data loads and then retrieves it.







 


> Flink Batch Read from Hudi table does not return any rows
> ---------------------------------------------------------
>
>                 Key: HUDI-7354
>                 URL: https://issues.apache.org/jira/browse/HUDI-7354
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink-sql
>    Affects Versions: 0.14.1
>            Reporter: Prabhu Joseph
>            Priority: Major
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql 
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
>  1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
> state.checkpoint-storage=filesystem  -D 
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data <<EOF
> 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
> forms a Hudi table and puts in the data from the filesystem table. Finally, 
> it runs a select query from the Hudi table. The select query does not return 
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly 
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
> until the data loads and then retrieves it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

Reply via email to