shunping commented on issue #31040:
URL: https://github.com/apache/beam/issues/31040#issuecomment-2569764217

   To clarify the behavior of textio with various content encoding, content 
type, and compression settings, I've expanded the table in the Apache Beam 
GitHub issue 
[#18390](https://github.com/apache/beam/issues/18390#issuecomment-1422729486). 
This table compares the behavior across two Beam SDK versions: 2.52.0 (prior to 
the GCSIO migration) and 2.62.0 (the upcoming release). I also include the 
proposed behavior of my fix in the last column.
   
   
![image](https://github.com/user-attachments/assets/629a95f4-d72c-47ce-9ab6-8d7e180fa3d2)
   
   A few notes about how the data is generated.
   - For the first 3 x 2 x 3 rows, the text data is gzipped locally and then 
uploaded to gcs. Then the metadata values of `content-type` and 
`content-encoded` are manually adjusted.
   - For the row marked as "copy default text file", the text data is directly 
copied/uploaded to gcs without gzip.
   - For the row marked as "copy default gzip file", the gzipped text data is 
copied/uploaded to gcs.
   - For the row marked as "copy default text file with gzip-local flag", the 
**text data** is uploaded to gcs with the said flag. 
    `gcloud storage cp -Z ./textio-test-data.1k.txt 
gs://apache-beam-samples/textio/textio-test-data.gzip-local.1k.txt.gz`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to