milenkovicm opened a new issue, #16944:
URL: https://github.com/apache/datafusion/issues/16944
### Is your feature request related to a problem or challenge?
At the moment ballista overrides
`LogicalExtensionCodec::try_decode_file_format` &
`LogicalExtensionCodec::try_encode_file_format` providing support for:
```rust
file_format_codecs: vec![
Arc::new(ParquetLogicalExtensionCodec {}),
Arc::new(CsvLogicalExtensionCodec {}),
Arc::new(JsonLogicalExtensionCodec {}),
Arc::new(ArrowLogicalExtensionCodec {}),
Arc::new(AvroLogicalExtensionCodec {}),
],
```
as seen at [1]. Should we want to integrate ballista with datafusion python
we would need to provide a custom `LogicalExtensionCodec` implementing same
logic or reusing ballista `LogicalExtensionCodec` implementation. As this file
types are supported out of the box in datafusion would it make sense to
implement encoder/decoder or them in `DefaultLogicalExtensionCodec`?
[1]:
https://github.com/milenkovicm/arrow-ballista/blob/e1e9f6ca423fd558664a7f2fb3b1bc3ed07d7db8/ballista/core/src/serde/mod.rs#L164-L165
### Describe the solution you'd like
Should this proposal make sense, implement support for it in
`DefaultLogicalExtensionCodec` similar to what is supported in ballista already:
```rust
fn try_decode_file_format(
&self,
buf: &[u8],
ctx: &datafusion::prelude::SessionContext,
) -> Result<Arc<dyn
datafusion::datasource::file_format::FileFormatFactory>> {
let proto = FileFormatProto::decode(buf)
.map_err(|e| DataFusionError::Internal(e.to_string()))?;
let codec = self
.file_format_codecs
.get(proto.encoder_position as usize)
.ok_or(DataFusionError::Internal(
"Can't find required codec in file codec list".to_owned(),
))?;
codec.try_decode_file_format(&proto.blob, ctx)
}
fn try_encode_file_format(
&self,
buf: &mut Vec<u8>,
node: Arc<dyn
datafusion::datasource::file_format::FileFormatFactory>,
) -> Result<()> {
let mut blob = vec![];
let (encoder_position, _) =
self.try_any(|codec| codec.try_encode_file_format(&mut blob,
node.clone()))?;
let proto = FileFormatProto {
encoder_position,
blob,
};
proto
.encode(buf)
.map_err(|e| DataFusionError::Internal(e.to_string()))
}
```
https://github.com/milenkovicm/arrow-ballista/blob/e1e9f6ca423fd558664a7f2fb3b1bc3ed07d7db8/ballista/core/src/serde/mod.rs#L214-L215
### Describe alternatives you've considered
alternative would be to keep everything as it is.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]