[
https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277230#comment-17277230
]
Antoine Pitrou commented on ARROW-11463:
----------------------------------------
[~lausen] I'm not sure your question has a possible answer, but please note
that both PyArrow serialization and Plasma are deprecated and unmaintained. For
the former, the recommended replacement is pickle with protocol 5. For the
latter, you may want to contact the developers of the Ray project (they used to
maintain Plasma and decided to fork it).
> Allow configuration of IpcWriterOptions 64Bit from PyArrow
> ----------------------------------------------------------
>
> Key: ARROW-11463
> URL: https://issues.apache.org/jira/browse/ARROW-11463
> Project: Apache Arrow
> Issue Type: Task
> Components: Python
> Reporter: Leonard Lausen
> Assignee: Tao He
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> For tables with many chunks (2M+ rows, 20k+ chunks), `pyarrow.Table.take`
> will be around 1000x slower compared to the `pyarrow.Table.take` on the table
> with combined chunks (1 chunk). Unfortunately, if such table contains large
> list data type, it's easy for the flattened table to contain more than 2**31
> rows and serialization of the table with combined chunks (eg for Plasma
> store) will fail due to `pyarrow.lib.ArrowCapacityError: Cannot write arrays
> larger than 2^31 - 1 in length`
> I couldn't find a way to enable 64bit support for the serialization as called
> from Python (IpcWriteOptions in Python does not expose the CIpcWriteOptions
> 64 bit setting; further the Python serialization APIs do not allow
> specification of IpcWriteOptions)
> I was able to serialize successfully after changing the default and rebuilding
> {code:c++}
> modified cpp/src/arrow/ipc/options.h
> @@ -42,7 +42,7 @@ struct ARROW_EXPORT IpcWriteOptions {
> /// \brief If true, allow field lengths that don't fit in a signed 32-bit
> int.
> ///
> /// Some implementations may not be able to parse streams created with
> this option.
> - bool allow_64bit = false;
> + bool allow_64bit = true;
>
> /// \brief The maximum permitted schema nesting depth.
> int max_recursion_depth = kMaxNestingDepth;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)