Jefffrey opened a new issue, #5547:
URL: https://github.com/apache/arrow-rs/issues/5547

   **Describe the bug**
   
   Given a feather written file from PyArrow, when using the arrow-ipc reader 
to read this file, a flatbuffers ParseError is thrown due to invalid UTF8
   
   **To Reproduce**
   
   Given this ORC file:
   
   
https://github.com/apache/orc/blob/fa627ec6d7c72289c8a83632e6a43ae48603fc4b/examples/TestOrcFile.metaData.orc
   
   When using PyArrow 15.0.0 to read and write it out to feather:
   
   ```python
   >>> from pyarrow import feather, orc
   >>> table = orc.read_table("TestOrcFile.metaData.orc")
   >>> feather.write_feather(table, "/tmp/test.feather")
   >>> feather.read_table("/tmp/test.feather")
   pyarrow.Table
   boolean1: bool
   byte1: int8
   short1: int16
   int1: int32
   long1: int64
   float1: float
   double1: double
   bytes1: binary
   string1: string
   middle: struct<list: list<item: struct<int1: int32, string1: string>>>
     child 0, list: list<item: struct<int1: int32, string1: string>>
         child 0, item: struct<int1: int32, string1: string>
             child 0, int1: int32
             child 1, string1: string
   list: list<item: struct<int1: int32, string1: string>>
     child 0, item: struct<int1: int32, string1: string>
         child 0, int1: int32
         child 1, string1: string
   map: map<string, struct<int1: int32, string1: string>>
     child 0, entries: struct<key: string not null, value: struct<int1: int32, 
string1: string>> not null
         child 0, key: string not null
         child 1, value: struct<int1: int32, string1: string>
             child 0, int1: int32
             child 1, string1: string
   ----
   boolean1: [[true]]
   byte1: [[127]]
   short1: [[1024]]
   int1: [[42]]
   long1: [[45097156608]]
   float1: [[3.1415]]
   double1: [[-2.713]]
   bytes1: [[null]]
   string1: [[null]]
   middle: [
     -- is_valid:  [false]
     -- child 0 type: list<item: struct<int1: int32, string1: string>>
   [null]]
   ...
   >>>
   ```
   
   Then trying to read this file with arrow-ipc:
   
   ```rust
       #[test]
       fn test_123() {
           let _ = FileReaderBuilder::new()
               .build(std::fs::File::open("/tmp/test.feather").unwrap())
               .unwrap();
       }
   ```
   
   It throws error:
   
   ```
   arrow-rs$ cargo test -p arrow-ipc --lib reader::tests::test_123
       Finished test [unoptimized + debuginfo] target(s) in 0.05s
        Running unittests src/lib.rs 
(target/debug/deps/arrow_ipc-b6339780ea47b538)
   
   running 1 test
   test reader::tests::test_123 ... FAILED
   
   failures:
   
   ---- reader::tests::test_123 stdout ----
   thread 'reader::tests::test_123' panicked at arrow-ipc/src/reader.rs:1862:14:
   called `Result::unwrap()` on an `Err` value: ParseError("Unable to get root 
as footer: Utf8Error { error: Utf8Error { valid_up_to: 1, error_len: Some(1) }, 
range: 208..40208, error_trace: ErrorTrace([TableField { field_name: \"value\", 
position: 200 }, VectorElement { index: 0, position: 96 }, TableField { 
field_name: \"custom_metadata\", position: 88 }, TableField { field_name: 
\"schema\", position: 24 }]) }")
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   
   
   failures:
       reader::tests::test_123
   
   test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 36 filtered 
out; finished in 0.00s
   
   error: test failed, to rerun pass `-p arrow-ipc --lib`
   ```
   
   **Expected behavior**
   
   Should be able to read file successfully.
   
   **Additional context**
   
   Though error likely lies upstream with flatbuffers, maybe there is a way we 
can allow the ipc reader to ignore invalid custom_metadata via user 
configuration?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to