alihan-synnada opened a new pull request, #13412:
URL: https://github.com/apache/datafusion/pull/13412
## Which issue does this PR close?
None
## Rationale for this change
Part of #13411
This PR implements a common `Decoder` trait, the `BatchDeserializer` trait
and the `DecoderDeserializer` struct as described in the issue, along with
`CsvDecoder` and `JsonDecoder` as `arrow-csv` and `arrow-json` `Decoder`s are
readily available.
## What changes are included in this PR?
Note: There are about 290 lines of new tests, so it is about 250 lines of
actual code.
- Add `BatchDeserializer` as a common interface.
- `digest` consumes the input in chunks
- `next` attempts to deserialize the digested data and returns a
`DeserializerOutput` which is either a `RecordBatch`, `RequiresMoreData` and
`InputExhausted`
- `finish` signals the end of the input stream
- Add `Decoder` trait
- Mimics arrow-json and arrow-csv's `Decoder`s
- Implement `Decoder` for `CsvDecoder` and `JsonDecoder` by forwarding
methods
- Add `DecoderDeserializer` and implement `BatchDeserializer` for formats
that have a `Decoder` implementation.
- Add `deserialize_stream` function to deduplicate the deserialization logic
## Are these changes tested?
Yes, the changes are covered by new tests added to the CSV and JSON modules.
## Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]