[GitHub] [beam] robertwb commented on a change in pull request #13489: Workaround for incremental in read_json.

GitBox Fri, 11 Dec 2020 15:49:27 -0800


robertwb commented on a change in pull request #13489:
URL: https://github.com/apache/beam/pull/13489#discussion_r541425726




##########
File path: sdks/python/apache_beam/dataframe/io.py
##########
@@ -59,6 +59,11 @@ def read_fwf(path, *args, **kwargs):
 
 
 def read_json(path, *args, **kwargs):
+  if 'nrows' in kwargs:
+    raise NotImplementedError('nrows not yet supported')
+  elif kwargs.get('lines', False):
+    # Work around https://github.com/pandas-dev/pandas/issues/34548.
+    kwargs = dict(kwargs, nrows=1 << 63)

Review comment:
       Due to a bug in the code, the entire file is read into memory all at 
once unless both chunksize *and* nrows are set. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] robertwb commented on a change in pull request #13489: Workaround for incremental in read_json.

Reply via email to