[
https://issues.apache.org/jira/browse/AVRO-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900415#action_12900415
]
Harsh J Chouraria commented on AVRO-458:
----------------------------------------
I've put together a simple cli tool with Python that does the following (with
some tunable opts):
CSV to Avro ->
1. Pass a schema file or it generates one based on CSV header with all string
types.
2. Read/Split each CSV record (from a list of input files) with given
delimiter (default ',') and convert their data to their valid schema types.
p.s. In case of an exception during data-type-mappings (like say null in
place of what's supposed to be a float in CSV), check if there's a default
field in the schema passed and use it. Else throw an informative exception. I
know this makes the 'default' meaning of the schema look wrong, but its a great
feature to have!
3. Write these records down into a data file.
Avro to CSV ->
1. Pass a schema to read selective data. Else it reads the file with full
schema.
2. Read each record [only works with records for now] and convert all data to
string type. Can read from many avro files into a csv file.
3. Write to a csv file with an optional header.
Currently the code (WIP) resides on GitHub at:
http://github.com/QwertyManiac/avroutils but I'll submit the stuff as a formal
patch once it feels complete.
This comment is for gaining some suggestions. What to extend/etc.
> add tools that read/write CSV records from/to avro data files
> -------------------------------------------------------------
>
> Key: AVRO-458
> URL: https://issues.apache.org/jira/browse/AVRO-458
> Project: Avro
> Issue Type: New Feature
> Reporter: Doug Cutting
>
> It might be useful to have command-line tools that can read & write arbitrary
> CSV data from & to Avro data files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.