DanielCarter-stack commented on issue #10565: URL: https://github.com/apache/seatunnel/issues/10565#issuecomment-4003697695
<!-- code-pr-reviewer --> This issue appears to be related to how `file_filter_pattern` interacts with `file_format_type=BINARY` in SftpFile. **Analysis of the root cause:** When `file_filter_pattern` filters out all files (or matches zero files), the schema generation logic in `BaseFileSourceConfig.parseCatalogTable()` (lines 86-116) may not correctly handle the empty `filePaths` case for BINARY format. This can cause a schema mismatch that triggers the error you're seeing in `BinaryWriteStrategy.setCatalogTable()` (lines 54-60), which validates that the incoming schema matches `BinaryReadStrategy.binaryRowType`. **Your current configuration may be causing the pattern to match zero files:** - Your `file_filter_pattern = "/opt/module/qingyang/.*\\.pdf"` includes the full absolute path - The matching logic in `AbstractReadStrategy.filterFileByPattern()` (lines 523-536) performs pattern matching against the full absolute path - Try using: `file_filter_pattern = ".*\\.pdf"` instead **To help diagnose this further, could you provide:** 1. The full job configuration (sanitized) 2. Whether the `/opt/module/qingyang` directory definitely contains `.pdf` files 3. Logs showing the parsed configuration and any warnings about filtered files **Related code locations:** - `connector-file-base/.../BaseFileSourceConfig.java:86-116` - schema parsing logic - `connector-file-base/.../BinaryWriteStrategy.java:54-60` - schema validation - `connector-file-base/.../AbstractReadStrategy.java:523-536` - file filtering -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
