[GitHub] [arrow-datafusion] Igosuki opened a new issue #903: Avro table provider

GitBox Wed, 18 Aug 2021 03:38:46 -0700


Igosuki opened a new issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903



   In a platform I work on, I decided to write avro log files so I could easily 
close and append binary files to s3. Since I didn't want to bother transforming 
it to another format using Spark, which is the thing I wanted to drop in the 
first place, I started writing what's required to read avro as a datasource in 
datafusion.
   
   Here is the branch on my fork (I merged the nested field PR in it but it can 
be removed) :
   https://github.com/Igosuki/arrow-datafusion/tree/avro2_m
   
   I transformed all parquet test files to avro and plan to add a test case for 
each of these.
   
   My question would be is Avro support desirable for datafusion or should I 
just make a sidecar crate on my own ?
   
   **Describe alternatives you've considered**
   Transforming data in json or parquet to reuse the existing code.
   
   **Additional context**
   I'm new to the new arrow data types, and it's been a challenge to find out 
what I should do with avro union types that are just a nullable field. 
Ultimately I decided to make them nullable fields and drop the union, but I had 
to add special cases here and there because of that.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Igosuki opened a new issue #903: Avro table provider

Reply via email to