wuchunfu commented on issue #10414:
URL: https://github.com/apache/seatunnel/issues/10414#issuecomment-3817242203

   > [@wuchunfu](https://github.com/wuchunfu) Hello, I’d like to confirm the 
specific requirements with you. Please feel free to supplement any details if 
there are any discrepancies on our understanding.
   > 
   > * you want a real-time/continuous file ingestion capability for file 
sources (FTP/SFTP/HDFS/local, etc.) that (a) periodically scans/monitors a 
directory for new/updated files, (b) avoids re-processing files already 
transferred, and (c) supports post-actions like delete after transfer, backup 
then delete, delete by retention/expiration, plus tunables like scan interval, 
priority queue/queue size/buffer size, and thread/concurrency. Is that 
accurate? Are you targeting a long-running streaming job, or a “run 
periodically” batch job?
   > * Current SeaTunnel status already have a limited incremental sync mode 
sync_mode=update, but it is only supported for file_format_type=binary, and 
currently it is effectively only exposed/usable in the HdfsFile source 
(FTP/SFTP/local do not expose sync_mode=update options yet). Also, this is 
batch-style (file list is built at startup), not real-time monitoring.
   
   @yzeng1618 Thank you for your reply. You understand without a problem. We 
need a regular or monitored directory, and files that have already been read or 
transferred do not need to be transferred again. If you are interested, I can 
assign them to you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to