scovich commented on issue #6522:
URL: https://github.com/apache/arrow-rs/issues/6522#issuecomment-2639914570

   Update: I just remembered, there's a critical difference between 
json-parsing a column of strings vs. parsing a json file. The 
[reader](https://docs.rs/arrow-json/50.0.0/arrow_json/reader/index.html) is 
meant for general JSON parsing, not just newline-delimited, and ignores all 
whitespace that doesn't actively break parsing (e.g. it doesn't tolerate 
newlines inside a field name or integer literal):
   > The reader ignores whitespace between JSON values, including \n and \r, 
allowing parsing of sequences of one or more arbitrarily formatted JSON values, 
including but not limited to newline-delimited JSON.
   
   While that makes sense for file parsing, the values inside a column of 
strings need to be parsed completely independently of each other. _that_ is why 
column parsing needs more state to be exposed, so that the caller can verify 
that each string was a single complete record before moving on to the next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to