olwmc opened a new issue, #4432:
URL: https://github.com/apache/arrow-rs/issues/4432
**Describe the bug**
<!--
A clear and concise description of what the bug is.
-->
Attempting to construct an `arrow::ipc::reader::FileReader` using
`FileReader::try_new` on an arrow file with more than one million columns fails.
**To Reproduce**
<!--
Steps to reproduce the behavior:
-->
Running the following code on a suitable file with greater than one million
columns,
```rust
let mut file = File::open("my-file-with-gt-1million-cols.arrow").unwrap();
let reader = FileReader::try_new(reader, None).unwrap(); //
arrow::ipc::reader::FileReader
```
Produces the following error:
```
panicked at 'called `Result::unwrap()` on an `Err` value: IoError("Unable to
get root as footer: TooManyTables")'
```
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
You would expect the `FileReader` to construct normally.
**Additional context**
<!--
Add any other context about the problem here.
-->
I took some time to understand this bug and it's upstream from a flatbuffer
default setting. I believe this is a direct consequence of this chain of calls:
FileReader::try_new -> File::root_as_footer -> flatbuffers::root, which
constructs a VerifierOptions with default arguments for which
VerifierOptions::default().max_tables is 1,000,000. The result of all of this
is when you call FileReader::try_new on a file that contains > 1,000,000
columns, you get an error message propagating from [this
line](https://docs.rs/arrow-ipc/41.0.0/src/arrow_ipc/reader.rs.html#691) which
is just upstream from
[here](https://github.com/google/flatbuffers/blob/23922e7eba51e666f9af13942ddb6cd6d58ef0de/rust/flatbuffers/src/verifier.rs#L418).
I believe this issue could be fixed by changing
[`root_as_footer`](https://github.com/apache/arrow-rs/blob/15e0e76bfb6500799a43991f1339e69464c513f8/arrow-ipc/src/gen/File.rs#LL444C5-L444C5)
to first construct a `VerifierOptions` with a different option for
`max_tables` and then pass it to `flatbuffers::root_with_opts` along with the
buffer. This gets around the default one million limit imposed by
[`flatbuffers::root`](https://github.com/google/flatbuffers/blob/23922e7eba51e666f9af13942ddb6cd6d58ef0de/rust/flatbuffers/src/get_root.rs#L26)
which just constructs a `VerifierOptions::default()` and passes it to
`root_with_opts` anyway.
Additionally, `arrow2` seems to have [dealt with this issue
previously](https://github.com/jorgecarleitao/arrow2/commit/8e146a78f53899dd7f7166512c1b73a51803b0ab).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]