zmrdltl commented on issue #5410:
URL: https://github.com/apache/arrow-rs/issues/5410#issuecomment-1962309045
In order to handle a parquet file with an SQL statement, you must convert
the data into a gluesql schema and read it, then convert it to parquet's enum
SchemaType when writing the parquet file, and then convert the glue sql data
type to the parquet data type for each field's schema. At this time, writing is
done using `SerializeColumnWriter`.
At this time, using a ColumnWriter for each data causes a lot of
duplication.
```rust
(Value::Null, ColumnWriter::Int32ColumnWriter(ref mut typed)) => {
typed.write_batch(&[], Some(&[0]), None).map_storage_err()?;
}
(Value::Null, ColumnWriter::Int64ColumnWriter(ref mut typed)) => {
typed.write_batch(&[], Some(&[0]), None).map_storage_err()?;
}
(Value::I8(val), ColumnWriter::Int32ColumnWriter(ref mut typed)) => {
typed.write_batch(&[val as i32], Some(&[1]), None).map_storage_err()?;
}
(Value::Date(d), ColumnWriter::Int32ColumnWriter(ref mut typed)) => {
..
typed.write_batch(&[days_since_epoch], Some(&[1]),
None).map_storage_err()?;
}
(Value::U8(val), ColumnWriter::Int32ColumnWriter(ref mut typed)) => {
typed.write_batch(&[val as i32], Some(&[1]), None).map_storage_err()?;
}
```
Therefore, in the process of refactoring the following code, I wrote a
generic function, but I couldn't define it without importing
`ColumnValueEncoder`, so I asked about it. Since `ColumnWriterImpl` is a type
rather than a trait, it seems difficult to apply the structure. If so, what
structure should I use?
```rust
use parquet::column::writer::{encoder::ColumnValueEncoder,
GenericColumnWriter}
fn write_null<T, E>(typed: &mut GenericColumnWriter<'_, E>) ->
Result<(), Error>
where
E: ColumnValueEncoder<T>,
{
typed.write_batch(&[], Some(&[0]), None).map_storage_err()
}
fn write_column<T, E>(
value: Value,
col_writer: &mut GenericColumnWriter<'_, E>,
) -> Result<(), Error>
where
E: ColumnValueEncoder<T>,
{
match value {
Value::Null => write_null(col_writer),
_ => write_value(col_writer, value),
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]