scovich opened a new pull request, #7092:
URL: https://github.com/apache/arrow-rs/pull/7092

   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-rs/issues/6522
   
   # Rationale for this change
   
   The current JSON decoder has no way to distinguish record boundaries from 
buffer boundaries, which makes it very difficult to correctly and efficiently 
parse a series of unrelated JSON values (such as from a `StringArray` column). 
Bad examples include: blank strings (no rows produced), a single string 
containing multiple records (multiple rows produced), or multiple invalid 
strings whose concatenation looks like a single record (one row produced). 
   
   Such cases can be detected easily by checking the number of records parsed, 
and whether the last record was incomplete -- but that state is not publicly 
accessible (buried in the `TapeDecoder` struct).
   
   # What changes are included in this PR?
   
   Expose two new methods on the `TapeDecoder` struct, which support three new 
pub methods on `Decoder`, which exposes the number of records the decoder has 
buffered up so far, and whether the last record is partial (incomplete).
   
   Also update documentation and add unit tests.
   
   # Are there any user-facing changes?
   
   Two new public methods on `Decoder`: `has_partial_record`, `len`, and 
`is_empty`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to