comicfans commented on issue #35105: URL: https://github.com/apache/arrow/issues/35105#issuecomment-1507910900
seems like this is a common problem, pyarrow also give same error for the generated file. I've attached the good input file for testing [sample.zip](https://github.com/apache/arrow/files/11228777/sample.zip) unzip this file , running following python code: ```python from pyarrow import parquet a = parquet.read_table('sample.parquet') parquet.write_table(a,"bug.parquet", use_dictionary=["contract_name"],use_byte_stream_split=["last_price",'bid_price1']) parquet.read_table('bug.parquet') ``` got ``` ore.py", line 2601, in read table = self._dataset.to_table( ^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: Data size too small for number of values (corrupted file?) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
