robertwb commented on a change in pull request #13489:
URL: https://github.com/apache/beam/pull/13489#discussion_r541425726
##########
File path: sdks/python/apache_beam/dataframe/io.py
##########
@@ -59,6 +59,11 @@ def read_fwf(path, *args, **kwargs):
def read_json(path, *args, **kwargs):
+ if 'nrows' in kwargs:
+ raise NotImplementedError('nrows not yet supported')
+ elif kwargs.get('lines', False):
+ # Work around https://github.com/pandas-dev/pandas/issues/34548.
+ kwargs = dict(kwargs, nrows=1 << 63)
Review comment:
Due to a bug in the code, the entire file is read into memory all at
once unless both chunksize *and* nrows are set.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]