olwmc opened a new issue, #4432:
URL: https://github.com/apache/arrow-rs/issues/4432

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   Attempting to construct an `arrow::ipc::reader::FileReader` using 
`FileReader::try_new` on an arrow file with more than one million columns fails.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   Running the following code on a suitable file with greater than one million 
columns,
   ```rust
   let mut file = File::open("my-file-with-gt-1million-cols.arrow").unwrap();
   let reader = FileReader::try_new(reader, None).unwrap(); // 
arrow::ipc::reader::FileReader
   ```
   Produces the following error:
   ```
   panicked at 'called `Result::unwrap()` on an `Err` value: IoError("Unable to 
get root as footer: TooManyTables")'
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   You would expect the `FileReader` to construct normally.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   I took some time to understand this bug and it's upstream from a flatbuffer 
default setting. I believe this is a direct consequence of this chain of calls: 
FileReader::try_new -> File::root_as_footer -> flatbuffers::root, which 
constructs a VerifierOptions with default arguments for which 
VerifierOptions::default().max_tables is 1,000,000.  The result of all of this 
is when you call FileReader::try_new on a file that contains > 1,000,000 
columns, you get an error message propagating from [this 
line](https://docs.rs/arrow-ipc/41.0.0/src/arrow_ipc/reader.rs.html#691) which 
is just upstream from 
[here](https://github.com/google/flatbuffers/blob/23922e7eba51e666f9af13942ddb6cd6d58ef0de/rust/flatbuffers/src/verifier.rs#L418).
 
   
   I believe this issue could be fixed by changing 
[`root_as_footer`](https://github.com/apache/arrow-rs/blob/15e0e76bfb6500799a43991f1339e69464c513f8/arrow-ipc/src/gen/File.rs#LL444C5-L444C5)
 to first construct a `VerifierOptions` with a different option for 
`max_tables` and then pass it to `flatbuffers::root_with_opts` along with the 
buffer. This gets around the default one million limit imposed by 
[`flatbuffers::root`](https://github.com/google/flatbuffers/blob/23922e7eba51e666f9af13942ddb6cd6d58ef0de/rust/flatbuffers/src/get_root.rs#L26)
 which just constructs a `VerifierOptions::default()` and passes it to 
`root_with_opts` anyway.
   
   Additionally, `arrow2` seems to have [dealt with this issue 
previously](https://github.com/jorgecarleitao/arrow2/commit/8e146a78f53899dd7f7166512c1b73a51803b0ab).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to