Re: [I] [Feature][Connector-V2][File] Add real-time monitoring and reading of file types such as FTP, SFTP, HDFS, local files, etc [seatunnel]

via GitHub Wed, 28 Jan 2026 23:03:13 -0800


DanielCarter-stack commented on issue #10414:
URL: https://github.com/apache/seatunnel/issues/10414#issuecomment-3815857075


   <!-- code-pr-reviewer -->
   Thanks for the feature request. I've reviewed the codebase and need some 
clarification to better understand the scope:
   
   **Current state:**
   - All file connectors (FTP, SFTP, HDFS, LocalFile) operate in **batch mode** 
(`Boundedness.BOUNDED`) - see `BaseMultipleTableFileSource.java:62-63`
   - File lists are built once at startup in 
`FileSourceSplitEnumerator.java:79-83`, with no continuous monitoring
   - `sync_mode=update` exists for incremental sync 
(`FileBaseSourceOptions.java:133-142`) but:
     - Only supports `file_format_type=binary`
     - Only exposed for HDFS (`HdfsFile.md:83-87`)
     - Still batch-oriented, not real-time monitoring
   
   **Questions:**
   1. Do you require a **long-running streaming job** with continuous 
monitoring (requires `UNBOUNDED` mode architecture changes), or is **external 
scheduling** (e.g., crontab/Airflow running periodic batch jobs) acceptable?
   
   2. Would extending `sync_mode=update` to FTP/SFTP/local files and supporting 
all file formats meet your "avoid duplicate transfers" requirement?
   
   3. Which features are **must-have** vs. **nice-to-have**: periodic scanning, 
avoiding duplicates, post-transfer delete/archive, priority queues, concurrency 
controls?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature][Connector-V2][File] Add real-time monitoring and reading of file types such as FTP, SFTP, HDFS, local files, etc [seatunnel]

Reply via email to