kennknowles commented on issue #18390:
URL: https://github.com/apache/beam/issues/18390#issuecomment-1179313964

   Bringing over some context from 
https://cloud.google.com/storage/docs/transcoding it seems like there are the 
following consistent situations:
   
   1. GCS transcodes and Beam works with this transparently.
      - `Content-encoding: gzip`
      - `Content-type: X`
      - Beam's IO reads it expecting contents to be X. I believe the problem is 
that GCS serves metadata that results in wrong splits.
   2. GCS does not transcode because the metadata is set to not transcode 
(current recommendation)
       - `Content-encoding: <empty>`
       - `Content-typ: gzip`
       - Beam's IO reads and the user specifies gzip or it is autodetected by 
the IO
   3. GCS does not transcode because the Beam IO requests no transcoding
       - `Content-encoding: gzip`
       - `Content-type: X`
       - Beam's IO passes the header `Accept-Encoding: gzip`
   
   I believe 2 is the only one that works today. I am not sure if 1 is 
possible. I do think that 3 should be able to work, but needs some 
implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to