shunping commented on issue #31040:
URL: https://github.com/apache/beam/issues/31040#issuecomment-2543429535

   I briefly debugged the issue, it seems that file is accessed with 
decompressive transcoding  
(https://cloud.google.com/storage/docs/transcoding#decompressive_transcoding).
   
   I checked the metadata of the GCS file in use, and the result is as follows.
   ```bash
   $ gcloud storage objects describe 
gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz
   acl:
   ...
   bucket: apache-beam-samples
   content_encoding: gzip
   content_type: text/plain
   crc32c_hash: /REfiQ==
   creation_time: 2024-04-27T20:19:48+0000
   ...
   metageneration: 1
   name: gcs/bigfile_with_encoding_plain.txt.gz
   size: 7635685
   storage_class: STANDARD
   storage_class_update_time: 2024-04-27T20:19:48+0000
   storage_url: 
gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz#1714249188497359
   update_time: 2024-04-27T20:19:48+0000
   ```
   
   The size field shows the size of the gzip file. The original file size prior 
to gzip should be 10100000.
   ```
   $ wc bigfile_with_encoding_plain.txt
     100000  252788 10100000 bigfile_with_encoding_plain.txt
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to