tustvold commented on issue #7119: URL: https://github.com/apache/arrow-rs/issues/7119#issuecomment-2652154904
Running the provided test we get the following output ``` Batch 1: RecordBatch { schema: Schema { fields: [Field { name: "outer", data_type: Struct([Field { name: "inner", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [StructArray -- validity: [ null, ] [ -- child 0: "inner" (Int32) PrimitiveArray<Int32> [ null, ] ]], row_count: 1 } Col1: PrimitiveArray<Int32> [ null, ] Batch 2: RecordBatch { schema: Schema { fields: [Field { name: "outer", data_type: Struct([Field { name: "inner", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [StructArray -- validity: [ null, ] [ -- child 0: "inner" (Int32) PrimitiveArray<Int32> [ 0, ] ]], row_count: 1 } Col2: PrimitiveArray<Int32> [ 0, ] ``` These two batches are actually logically equal, this can be seen as the ``` assert_eq!(batch1, batch2); ``` However, `assert_eq!(col1, col2);` fails because the columns are not equal, because they don't need to be - **the value of a child masked by a null in a parent is arbitrary**. Parquet doesn't even encode these values at all. I therefore don't think this is a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org