[
https://issues.apache.org/jira/browse/ARROW-17583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joost Hoozemans updated ARROW-17583:
------------------------------------
Description:
When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws
an exception:
Traceback (most recent call last):
File "pyarrow/_dataset_parquet.pyx", line 165, in
pyarrow._dataset_parquet.ParquetFileFormat._finish_write
File "pyarrow/{_}dataset.pyx", line 2695, in
pyarrow._dataset.WrittenFile.{_}{_}init{_}_
OverflowError: value too large to convert to int
Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
The file is written succesfully though. It seems related to this issue
https://issues.apache.org/jira/browse/ARROW-16761.
I would guess the problem is the python field is an int while the C++ code
returns an int64_t
[https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164]
was:
When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws
an exception:
Traceback (most recent call last):
File "pyarrow/_dataset_parquet.pyx", line 165, in
pyarrow._dataset_parquet.ParquetFileFormat._finish_write
File "pyarrow/_dataset.pyx", line 2695, in
pyarrow._dataset.WrittenFile.__init__
OverflowError: value too large to convert to int
Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
The file is written succesfully though. It seems related to this issue
https://issues.apache.org/jira/browse/ARROW-16761.
I would guess the problem is the python field is an int while the C++ code
return an int64_t
[https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164]
> [Python] File write visitor throws exception on large parquet file
> ------------------------------------------------------------------
>
> Key: ARROW-17583
> URL: https://issues.apache.org/jira/browse/ARROW-17583
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 9.0.0
> Reporter: Joost Hoozemans
> Priority: Minor
>
> When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws
> an exception:
> Traceback (most recent call last):
> File "pyarrow/_dataset_parquet.pyx", line 165, in
> pyarrow._dataset_parquet.ParquetFileFormat._finish_write
> File "pyarrow/{_}dataset.pyx", line 2695, in
> pyarrow._dataset.WrittenFile.{_}{_}init{_}_
> OverflowError: value too large to convert to int
> Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
> The file is written succesfully though. It seems related to this issue
> https://issues.apache.org/jira/browse/ARROW-16761.
> I would guess the problem is the python field is an int while the C++ code
> returns an int64_t
> [https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)