scovich commented on issue #6522: URL: https://github.com/apache/arrow-rs/issues/6522#issuecomment-2639914570
Update: I just remembered, there's a critical difference between json-parsing a column of strings vs. parsing a json file. The [reader](https://docs.rs/arrow-json/50.0.0/arrow_json/reader/index.html) is meant for general JSON parsing, not just newline-delimited, and ignores all whitespace that doesn't actively break parsing (e.g. it doesn't tolerate newlines inside a field name or integer literal): > The reader ignores whitespace between JSON values, including \n and \r, allowing parsing of sequences of one or more arbitrarily formatted JSON values, including but not limited to newline-delimited JSON. While that makes sense for file parsing, the values inside a column of strings need to be parsed completely independently of each other. _that_ is why column parsing needs more state to be exposed, so that the caller can verify that each string was a single complete record before moving on to the next. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
