nevi-me commented on issue #1882:
URL: https://github.com/apache/arrow-rs/issues/1882#issuecomment-1169528840

   @alamb there's indeed a coverage gap. If I change the column "a" to "f", and 
run the tests, the schema order changes, with the fields ordered alphabetically 
with `BTreeMap`.
   
   From what I can see, this change should be safe, as users should be looking 
up columns in a record batch by schema index, and not the column order of the 
input JSON data.
   
   It would be hard to see someone relying on some column order for JSON 
though, because what if the first record doesn't have some field (`b, c, d`) 
and the next one has (`a, e, f`)? So I think preserving field order might not 
that big a requirement.
   
   The failures are because we're testing field orders, and changing the input 
JSON reveals the difference.
   
   The test `test_basic_json` has the following:
   
   ```rust
   let a = schema.column_with_name("a").unwrap();
   # check that the column index is 0
   assert_eq!(0, a.0);
   ```
   
   This fails after changing column "a" to "f"
   
   ```diff
   let f = schema.column_with_name("f").unwrap();
   # check that the column index is 0
   - assert_eq!(0, f.0);
   + assert_eq!(4, f.0);
   ```
   
   So it looks like a safe solution is to assert that the field does exists, 
and not its order in the schema.
   
   By aliasing `IndexMap as HashMap`, we mistakenly leaked this type in a few 
public functions, e.g. `ReaderBuilder::with_format_strings`.
   Changing them back to `std::collections::HashMap` suffices, but this 
introduces a breaking change in the API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to