liferoad commented on issue #31040:
URL: https://github.com/apache/beam/issues/31040#issuecomment-2081172626

   I see. 
   
   ```
   # standard libraries
   import logging
   
   # third party libraries
   import apache_beam as beam
   from apache_beam import Create, Map
   from apache_beam.io.textio import ReadAllFromText
   from apache_beam.options.pipeline_options import PipelineOptions
   from apache_beam.transforms.combiners import Count
   
   logger = logging.getLogger()
   logger.setLevel(logging.INFO)
   
   elements = [
       # "gs://apache-beam-samples/gcs/bigfile.txt.gz",
       # "gs://apache-beam-samples/gcs/bigfile_with_encoding.txt.gz",
       "gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz",
   ]
   
   options = PipelineOptions()
   
   with beam.Pipeline(options=options) as p:
       (
           p
           | Create(elements)
           | "Read File from GCS"
           >> ReadAllFromText(
               compression_type=beam.io.filesystem.CompressionTypes.UNCOMPRESSED
           )
           | Count.Globally()
           | "Log" >> Map(lambda x: logging.info("Total lines %d", x))
       )
   ```
   This only loads 75,601 lines.
   
   #19413 could be related for uploading the file to GCS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to