rok commented on issue #49158: URL: https://github.com/apache/arrow/issues/49158#issuecomment-3862322083
The closest we come to documenting this behaviour is [here](https://arrow.apache.org/docs/cpp/json.html#data-types) and we don't really cover explicitly cover your case. We don't seem to test for inconsistent json column type anywhere. The way parser is implemented is that it will not cast from a number type to a string type when explicit schema is provided. Perhaps as workaround you can omit `_id` from your schema and set `unexpected_field_behavior='ignore'`, however then you lose the column's data. Another workaround is setting `('_id', pa.decimal128(20, 0))` in the schema and casting it to string later. . This is not great. But it should work. Can you test these two workarounds and report back? We should document current behaviour better and then discuss if we want to change it and how. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
