scovich opened a new pull request, #9232:
URL: https://github.com/apache/arrow-rs/pull/9232

   # Which issue does this PR close?
   
   - Closes https://github.com/apache/arrow-rs/issues/9204
   
   # Rationale for this change
   
   It's not good to tolerate obviously ill-formed JSON like `[,,, 10,,, 20,,,]`
   
   # What changes are included in this PR?
   
   Reject leading and repeated commas while still tolerating at most one 
trailing comma, since that's a common and intuitive case.
   
   While we're at it, optimize the tape decoder state machine to eliminate 
redundant decision-making. The performance benefits from that optimization 
compensate for the performance loss due to checking separately from commas.
   
   # Are these changes tested?
   
   Yes, new unit tests cover the expected behavior change and benchmarking 
shows moderate overall improvement in performance. Exception: the three 
`xxx_hex_json` variants are very noisy and show anything from 15% speedup to 
20% slowdown from run to run. But as far as I can tell they are all 
tape-decoding the exact same input JSON values, and any performance differences 
in the tape decoder should affect them equally. This leads me to conclude that 
those three benchmark cases are just plain unstable.
   
   # Are there any user-facing changes?
   
   JSON parsing now rejects ill-formed JSON it used to accept. Not sure if this 
might merit a documentation change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to