Jefffrey opened a new issue, #5547: URL: https://github.com/apache/arrow-rs/issues/5547
**Describe the bug** Given a feather written file from PyArrow, when using the arrow-ipc reader to read this file, a flatbuffers ParseError is thrown due to invalid UTF8 **To Reproduce** Given this ORC file: https://github.com/apache/orc/blob/fa627ec6d7c72289c8a83632e6a43ae48603fc4b/examples/TestOrcFile.metaData.orc When using PyArrow 15.0.0 to read and write it out to feather: ```python >>> from pyarrow import feather, orc >>> table = orc.read_table("TestOrcFile.metaData.orc") >>> feather.write_feather(table, "/tmp/test.feather") >>> feather.read_table("/tmp/test.feather") pyarrow.Table boolean1: bool byte1: int8 short1: int16 int1: int32 long1: int64 float1: float double1: double bytes1: binary string1: string middle: struct<list: list<item: struct<int1: int32, string1: string>>> child 0, list: list<item: struct<int1: int32, string1: string>> child 0, item: struct<int1: int32, string1: string> child 0, int1: int32 child 1, string1: string list: list<item: struct<int1: int32, string1: string>> child 0, item: struct<int1: int32, string1: string> child 0, int1: int32 child 1, string1: string map: map<string, struct<int1: int32, string1: string>> child 0, entries: struct<key: string not null, value: struct<int1: int32, string1: string>> not null child 0, key: string not null child 1, value: struct<int1: int32, string1: string> child 0, int1: int32 child 1, string1: string ---- boolean1: [[true]] byte1: [[127]] short1: [[1024]] int1: [[42]] long1: [[45097156608]] float1: [[3.1415]] double1: [[-2.713]] bytes1: [[null]] string1: [[null]] middle: [ -- is_valid: [false] -- child 0 type: list<item: struct<int1: int32, string1: string>> [null]] ... >>> ``` Then trying to read this file with arrow-ipc: ```rust #[test] fn test_123() { let _ = FileReaderBuilder::new() .build(std::fs::File::open("/tmp/test.feather").unwrap()) .unwrap(); } ``` It throws error: ``` arrow-rs$ cargo test -p arrow-ipc --lib reader::tests::test_123 Finished test [unoptimized + debuginfo] target(s) in 0.05s Running unittests src/lib.rs (target/debug/deps/arrow_ipc-b6339780ea47b538) running 1 test test reader::tests::test_123 ... FAILED failures: ---- reader::tests::test_123 stdout ---- thread 'reader::tests::test_123' panicked at arrow-ipc/src/reader.rs:1862:14: called `Result::unwrap()` on an `Err` value: ParseError("Unable to get root as footer: Utf8Error { error: Utf8Error { valid_up_to: 1, error_len: Some(1) }, range: 208..40208, error_trace: ErrorTrace([TableField { field_name: \"value\", position: 200 }, VectorElement { index: 0, position: 96 }, TableField { field_name: \"custom_metadata\", position: 88 }, TableField { field_name: \"schema\", position: 24 }]) }") note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: reader::tests::test_123 test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 36 filtered out; finished in 0.00s error: test failed, to rerun pass `-p arrow-ipc --lib` ``` **Expected behavior** Should be able to read file successfully. **Additional context** Though error likely lies upstream with flatbuffers, maybe there is a way we can allow the ipc reader to ignore invalid custom_metadata via user configuration? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
