DanielLeens commented on issue #10923:
URL: https://github.com/apache/seatunnel/issues/10923#issuecomment-4523790045

   Thanks for the concrete follow-up. With this new detail, this is no longer 
just a capability question; it is a real feature request.
   
   At the moment, the built-in `Http` source is still designed around text / 
JSON payloads:
   
   - the response body is materialized as `String`
   - the schema path does not expose a raw binary payload contract
   - and there is no built-in filename / content-type / chunk metadata model to 
pass downstream
   
   So supporting PDFs, images, videos, ZIPs, Office documents, and large-file 
stream transfer to a file sink would require more than a small connector tweak. 
It needs a clearer binary response contract and explicit design decisions 
around:
   
   1. how filename metadata is propagated (for example from 
`Content-Disposition`)
   2. whether the first phase is batch only, streaming only, or both
   3. whether chunking / block transfer is part of the first version, or 
deferred
   
   This looks reasonable to track as an enhancement, but I would strongly 
suggest narrowing the first scope. A practical MVP would be something like:
   
   1. batch-first raw binary download
   2. one binary payload field plus filename / content-type metadata
   3. no chunked resume / multi-part transfer in the first round
   
   If you want to continue with this direction, keeping the issue focused 
around that first MVP will make it much easier for the community to evaluate 
and implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to