yyh2954360585 opened a new issue, #9471:
URL: https://github.com/apache/hudi/issues/9471
**Describe the problem you faced**
Q1:
Assuming the source table order table has a total data volume of 5
million. Synchronize using deltasteamer JdbcSource
Hudi conf:
` --hoodie-conf hoodie.deltastreamer.jdbc.incr.pull=true`
`--hoodie-conf
hoodie.deltastreamer.jdbc.table.incr.column.name=update_date`
`--source-limit 100000`
`--continuous`
When deltasteamer synchronizes to 40w data, the current
lastCheckpoint=2023-08-17 14:55 0:00:00 So the SQL for
incrementalFetch Method to query source data is:
`select (select * from order where update_date>"2023-08-17 14:55
0:00:00" order by update_date limit 100000) rdbms_table`
Assuming that there is 200000 data in the updateDate field of my
order table, which is equal to "2023-08-17 14:55 1:00:000" will
only obtain 100000 rows of data due to sourceLimit=100000, and
will also lose 100000 rows of data.
Q2:
Why are these two parameters set?
**To Reproduce**
Steps to reproduce the behavior:
1.
2.
3.
4.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version :0.13.1
* Spark version :3.2.1
* Hive version :3.1.3
* Hadoop version :3.3.3
* Storage (HDFS/S3/GCS..) :HDFS
* Running on Docker? (yes/no) :no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]