Re: [I] avro_to_arrow: Support in memory apache_avro Value's [datafusion]

via GitHub Tue, 24 Sep 2024 03:10:42 -0700


JonasDev1 commented on issue #7690:
URL: https://github.com/apache/datafusion/issues/7690#issuecomment-2370845677


   I have the same use case. The main reason is that the avro reader needs a 
avro file input stream and I have only a binary array of Avro messages(From 
Kafka).
   
   My current workaround is to deserialze each message to a value and write all 
with the avro writer in memory and to deserialize them again with the 
datafusion Avro Reader.
   
   
   To solve this, I would like to split the AvroArrowArrayReader into a reader 
and a convertor(Vec<Value> to RecordBatch). 
   
   You could then also add a from_avro function, simillar to the one in 
[(Spark](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.avro.functions.from_avro.html)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] avro_to_arrow: Support in memory apache_avro Value's [datafusion]

Reply via email to