shunping commented on issue #31040: URL: https://github.com/apache/beam/issues/31040#issuecomment-2543429535
I briefly debugged the issue, it seems that file is accessed with decompressive transcoding (https://cloud.google.com/storage/docs/transcoding#decompressive_transcoding). I checked the metadata of the GCS file in use, and the result is as follows. ```bash $ gcloud storage objects describe gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz acl: ... bucket: apache-beam-samples content_encoding: gzip content_type: text/plain crc32c_hash: /REfiQ== creation_time: 2024-04-27T20:19:48+0000 ... metageneration: 1 name: gcs/bigfile_with_encoding_plain.txt.gz size: 7635685 storage_class: STANDARD storage_class_update_time: 2024-04-27T20:19:48+0000 storage_url: gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz#1714249188497359 update_time: 2024-04-27T20:19:48+0000 ``` The size field shows the size of the gzip file. The original file size prior to gzip should be 10100000. ``` $ wc bigfile_with_encoding_plain.txt 100000 252788 10100000 bigfile_with_encoding_plain.txt ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
