wjones1 commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-619463693
I found the cause of the test failure: If the `batch_size` isn't aligned with the `chunk_size`, categorical columns will fail with the error: ``` pyarrow.lib.ArrowNotImplementedError: This class cannot yet iterate chunked arrays ``` I think this means categorical columns/DictionaryArray columns aren't supported by this method for now, except if you are able to align the `batch_size` with `chunk_size`. Is it possible or even common that `chunk_size` might be variable within a file? (The reason we were seeing the error in Python 3.5 and not in later Python versions is I was selecting a subset of columns using indices, and the ordering of columns changed between Python versions. I think because of the change in dictionary ordering in 3.6+. I've instead moved to have the offending test run on all columns.) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
