pushpavanthar edited a comment on issue #969: [HUDI-251] JDBC incremental load 
to HUDI DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/969#issuecomment-559230742
 
 
   Hi @vinothchandar and @taherk77 
   I would like to add 2 points to this feature to make this very generic
   
   - [ ] We might need support for combination of incrementing columns. 
Incrementing columns can be of below types 
   1. Timestamp columns
   2. Auto Incrementing column
   3. Timestamp + Auto Incrementing.
   Instead of code figuring out the incremental pull strategy, it'll be better 
if user provides it as config for each table.
   Considering Timestamp incrementing column, there can be more than once 
column contributing to this strategy. e.g. When a row is creation, only 
`created_at` column is set and `updated_at` is null by default. When the same 
row is updated, `updated_at` gets assigned to some timestamp. In such cases it 
is wise to consider both columns in the query formation. 
   
   - [ ] We need to sort rows according to above mentioned incrementing columns 
to fetch rows in chunks (you can make use of `defaultFetchSize` in MySQL). I'm 
aware that sorting adds load on Database, but it helps in tracking the last 
pulled timestamp or auto incrementing id and help retry/resume from the point 
last recorded. This will be a saviour during failures.
   
   A sample MySQL query for incrementing timestamp columns as (`created_at` and 
`updated_at`)  might look like 
   `SELECT * FROM inventory.customers WHERE 
COALESCE(inventory.customers.updated_at, inventory.customers.created_at) > 
$last_recorder_time AND 
COALESCE(inventory.customers.updated_at,inventory.customers.created_at) < 
$current_time ORDER BY 
COALESCE(inventory.customers.updated_at,inventory.customers.created_at) ASC`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to