liferoad commented on issue #31040:
URL: https://github.com/apache/beam/issues/31040#issuecomment-2081172626
I see.
```
# standard libraries
import logging
# third party libraries
import apache_beam as beam
from apache_beam import Create, Map
from apache_beam.io.textio import ReadAllFromText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.transforms.combiners import Count
logger = logging.getLogger()
logger.setLevel(logging.INFO)
elements = [
# "gs://apache-beam-samples/gcs/bigfile.txt.gz",
# "gs://apache-beam-samples/gcs/bigfile_with_encoding.txt.gz",
"gs://apache-beam-samples/gcs/bigfile_with_encoding_plain.txt.gz",
]
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
(
p
| Create(elements)
| "Read File from GCS"
>> ReadAllFromText(
compression_type=beam.io.filesystem.CompressionTypes.UNCOMPRESSED
)
| Count.Globally()
| "Log" >> Map(lambda x: logging.info("Total lines %d", x))
)
```
This only loads 75,601 lines.
#19413 could be related for uploading the file to GCS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]