scovich opened a new pull request, #7092: URL: https://github.com/apache/arrow-rs/pull/7092
# Which issue does this PR close? Closes https://github.com/apache/arrow-rs/issues/6522 # Rationale for this change The current JSON decoder has no way to distinguish record boundaries from buffer boundaries, which makes it very difficult to correctly and efficiently parse a series of unrelated JSON values (such as from a `StringArray` column). Bad examples include: blank strings (no rows produced), a single string containing multiple records (multiple rows produced), or multiple invalid strings whose concatenation looks like a single record (one row produced). Such cases can be detected easily by checking the number of records parsed, and whether the last record was incomplete -- but that state is not publicly accessible (buried in the `TapeDecoder` struct). # What changes are included in this PR? Expose two new methods on the `TapeDecoder` struct, which support three new pub methods on `Decoder`, which exposes the number of records the decoder has buffered up so far, and whether the last record is partial (incomplete). Also update documentation and add unit tests. # Are there any user-facing changes? Two new public methods on `Decoder`: `has_partial_record`, `len`, and `is_empty`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org