kennknowles commented on issue #18390: URL: https://github.com/apache/beam/issues/18390#issuecomment-1179313964
Bringing over some context from https://cloud.google.com/storage/docs/transcoding it seems like there are the following consistent situations: 1. GCS transcodes and Beam works with this transparently. - `Content-encoding: gzip` - `Content-type: X` - Beam's IO reads it expecting contents to be X. I believe the problem is that GCS serves metadata that results in wrong splits. 2. GCS does not transcode because the metadata is set to not transcode (current recommendation) - `Content-encoding: <empty>` - `Content-typ: gzip` - Beam's IO reads and the user specifies gzip or it is autodetected by the IO 3. GCS does not transcode because the Beam IO requests no transcoding - `Content-encoding: gzip` - `Content-type: X` - Beam's IO passes the header `Accept-Encoding: gzip` I believe 2 is the only one that works today. I am not sure if 1 is possible. I do think that 3 should be able to work, but needs some implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
