wuchunfu commented on issue #10414: URL: https://github.com/apache/seatunnel/issues/10414#issuecomment-3817242203
> [@wuchunfu](https://github.com/wuchunfu) Hello, I’d like to confirm the specific requirements with you. Please feel free to supplement any details if there are any discrepancies on our understanding. > > * you want a real-time/continuous file ingestion capability for file sources (FTP/SFTP/HDFS/local, etc.) that (a) periodically scans/monitors a directory for new/updated files, (b) avoids re-processing files already transferred, and (c) supports post-actions like delete after transfer, backup then delete, delete by retention/expiration, plus tunables like scan interval, priority queue/queue size/buffer size, and thread/concurrency. Is that accurate? Are you targeting a long-running streaming job, or a “run periodically” batch job? > * Current SeaTunnel status already have a limited incremental sync mode sync_mode=update, but it is only supported for file_format_type=binary, and currently it is effectively only exposed/usable in the HdfsFile source (FTP/SFTP/local do not expose sync_mode=update options yet). Also, this is batch-style (file list is built at startup), not real-time monitoring. @yzeng1618 Thank you for your reply. You understand without a problem. We need a regular or monitored directory, and files that have already been read or transferred do not need to be transferred again. If you are interested, I can assign them to you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
