nevi-me commented on issue #1882:
URL: https://github.com/apache/arrow-rs/issues/1882#issuecomment-1169528840
@alamb there's indeed a coverage gap. If I change the column "a" to "f", and
run the tests, the schema order changes, with the fields ordered alphabetically
with `BTreeMap`.
From what I can see, this change should be safe, as users should be looking
up columns in a record batch by schema index, and not the column order of the
input JSON data.
It would be hard to see someone relying on some column order for JSON
though, because what if the first record doesn't have some field (`b, c, d`)
and the next one has (`a, e, f`)? So I think preserving field order might not
that big a requirement.
The failures are because we're testing field orders, and changing the input
JSON reveals the difference.
The test `test_basic_json` has the following:
```rust
let a = schema.column_with_name("a").unwrap();
# check that the column index is 0
assert_eq!(0, a.0);
```
This fails after changing column "a" to "f"
```diff
let f = schema.column_with_name("f").unwrap();
# check that the column index is 0
- assert_eq!(0, f.0);
+ assert_eq!(4, f.0);
```
So it looks like a safe solution is to assert that the field does exists,
and not its order in the schema.
By aliasing `IndexMap as HashMap`, we mistakenly leaked this type in a few
public functions, e.g. `ReaderBuilder::with_format_strings`.
Changing them back to `std::collections::HashMap` suffices, but this
introduces a breaking change in the API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]