shashbha14 opened a new pull request, #49177:
URL: https://github.com/apache/arrow/pull/49177
Fixes #49158
The issue: when you provide an explicit schema to the JSON parser, it errors
if JSON types don't exactly match schema types, even when conversion is
straightforward.
For example, if you have:n
{"_id": "152934"}
{"_id": 152934}And your schema says `_id` should be string, it fails on row
1 with "Column changed from string to number" instead of converting 152934 to
"152934".
I fixed this by making the parser attempt type conversion when an explicit
schema is provided. Before erroring on a type mismatch, it checks if we have an
explicit schema and tries to convert the value to match the expected type.
Changes:
- Store explicit_schema in HandlerBase so we can access it during parsing
- Modified AppendScalar() to try conversion before erroring when explicit
schema exists
- Added TryConvertAndAppend() helper that handles the conversion logic
- Updated Bool() handler to also support conversion
- Added tests for number->string and string->number cases
Conversions that work now:
- Number -> String (152934 -> "152934")
- String -> Number (when the string is numeric)
- Boolean conversions to/from string and number
- Number -> Boolean (0 is false, non-zero is true)
This only happens when explicit schema is provided, so it's backward
compatible. All existing tests still pass.
Fixes #49158
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]