AlenkaF commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1404700904
> #14355 indicates multiple threads can be used but there's no explicit documentation at `docs/source/cpp/json.rst`. I think the reference docs have all the info: https://arrow.apache.org/docs/dev/cpp/api/formats.html#_CPPv4N5arrow4json11ReadOptionsE https://arrow.apache.org/docs/dev/cpp/api/formats.html#_CPPv4N5arrow4json15StreamingReaderE It makes sense for the CSV stream reader to have [notes about only single-threaded reader](https://arrow.apache.org/docs/python/csv.html#incremental-reading) and that there are no additional notes on `use_threads` in [Reading JSON files](https://arrow.apache.org/docs/dev/cpp/json.html#reading-json-files ) as [ReadOptions](https://arrow.apache.org/docs/dev/cpp/api/formats.html#_CPPv4N5arrow4json11ReadOptionsE) are linked from there. > The tests I've written in test_json.py pass if use_threads is False or True. It looks like `cpp/src/arrow/json/reader_test.cc` has more specific tests for the AsyncStreamingReader. Oh yes, That is true. > 1. Do I need to write additional tests more specific to read_options.use_threads = True? I think you can add something similar to `TestSerialJSONRead` and `TestParallelJSONRead`. @jorisvandenbossche what do you think? > 2. Do I need to include documentation on read_options.use_threads? I think it is well documented in [pa.json.ReadOptions](https://arrow.apache.org/docs/python/generated/pyarrow.json.ReadOptions.html#pyarrow.json.ReadOptions). But you can definitely mention the possibility of using multiple threads in JSON incremental reading [in the general docs](https://arrow.apache.org/docs/dev/python/json.html). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
