DanielLeens commented on issue #10923: URL: https://github.com/apache/seatunnel/issues/10923#issuecomment-4538943252
Thanks, this follow-up makes the scope much clearer. With the concrete batch-first scenario, config sketch, and the large-file constraint, I agree this should stay open as a feature request rather than a question thread. After checking the current source path, the main gap is on the `Http` source side: - the response is still materialized as a `String` - the schema path only supports JSON - there is no built-in filename / content-type / part metadata contract to pass downstream One important implementation detail here is that SeaTunnel already has an existing binary row model on the file side: the binary file sink expects `data`, `relativePath`, and `partIndex` semantics rather than an arbitrary text payload. So the cleanest first version may be to make `Http` binary mode align with that existing binary contract, instead of inventing a separate one only for `Http`. Given your latest details, a practical first phase would be: 1. batch mode first 2. explicit `format = binary` 3. filename/path propagation with clear precedence between `file_path_expression` and `Content-Disposition` 4. chunked emission / streaming write semantics so large files do not have to be fully materialized in memory Then keep retry/resume, richer streaming semantics, and more advanced multipart behavior for a later phase. This looks useful enough to keep open as a feature enhancement. We've labeled it `help wanted` so contributors can pick it up, but keeping the first version tightly scoped will make it much easier to land. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
