DanielCarter-stack commented on issue #10414:
URL: https://github.com/apache/seatunnel/issues/10414#issuecomment-3815857075
<!-- code-pr-reviewer -->
Thanks for the feature request. I've reviewed the codebase and need some
clarification to better understand the scope:
**Current state:**
- All file connectors (FTP, SFTP, HDFS, LocalFile) operate in **batch mode**
(`Boundedness.BOUNDED`) - see `BaseMultipleTableFileSource.java:62-63`
- File lists are built once at startup in
`FileSourceSplitEnumerator.java:79-83`, with no continuous monitoring
- `sync_mode=update` exists for incremental sync
(`FileBaseSourceOptions.java:133-142`) but:
- Only supports `file_format_type=binary`
- Only exposed for HDFS (`HdfsFile.md:83-87`)
- Still batch-oriented, not real-time monitoring
**Questions:**
1. Do you require a **long-running streaming job** with continuous
monitoring (requires `UNBOUNDED` mode architecture changes), or is **external
scheduling** (e.g., crontab/Airflow running periodic batch jobs) acceptable?
2. Would extending `sync_mode=update` to FTP/SFTP/local files and supporting
all file formats meet your "avoid duplicate transfers" requirement?
3. Which features are **must-have** vs. **nice-to-have**: periodic scanning,
avoiding duplicates, post-transfer delete/archive, priority queues, concurrency
controls?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]