kylebarron commented on code in PR #7015:
URL: https://github.com/apache/arrow-rs/pull/7015#discussion_r1934355453
##########
arrow-json/src/writer/encoder.rs:
##########
@@ -25,126 +27,157 @@ use arrow_schema::{ArrowError, DataType, FieldRef};
use half::f16;
use lexical_core::FormattedSize;
use serde::Serializer;
-use std::io::Write;
#[derive(Debug, Clone, Default)]
pub struct EncoderOptions {
pub explicit_nulls: bool,
pub struct_mode: StructMode,
+ pub encoder_factory: Option<Arc<dyn EncoderFactory>>,
+}
+
+/// A trait to create custom encoders for specific data types.
+///
+/// This allows overriding the default encoders for specific data types,
+/// or adding new encoders for custom data types.
+pub trait EncoderFactory: std::fmt::Debug {
+ /// Make an encoder that if returned runs before all of the default
encoders.
+ /// This can be used to override how e.g. binary data is encoded so that
it is an encoded string or an array of integers.
+ fn make_default_encoder<'a>(
+ &self,
+ _array: &'a dyn Array,
Review Comment:
I'd be interested in using the changes in this PR to write
[GeoJSON](https://datatracker.ietf.org/doc/html/rfc7946) from
[geoarrow-rs](https://github.com/geoarrow/geoarrow-rs).
However this API would not be sufficient for me because it assumes that the
physical `Array` is enough to know how to encode the data. This is not true for
geospatial data (at least for Arrow data according to the [GeoArrow
specification](https://geoarrow.org/)) because the same physical layout can
describe multiple types.
E.g. an array of `LineString` and an array of `MultiPoint` would both be
stored as an Arrow `List[FixedSizeList[2, Float64]]`, but the extension
metadata on the `Field` would be necessary to know whether to write [`"type":
"MultiPoint"`](https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.3) or
[`"type":
"LineString"`](https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.4) in
each JSON object.
Given that the existing json `Writer` API writes a `RecordBatch`, it should
be possible to access the `Field` and pass that through here, instead of just
using the `Array`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]