[ 
https://issues.apache.org/jira/browse/AVRO-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900415#action_12900415
 ] 

Harsh J Chouraria commented on AVRO-458:
----------------------------------------

I've put together a simple cli tool with Python that does the following (with 
some tunable opts):

CSV to Avro ->
  1. Pass a schema file or it generates one based on CSV header with all string 
types.
  2. Read/Split each CSV record (from a list of input files) with given 
delimiter (default ',') and convert their data to their valid schema types.
  p.s. In case of an exception during data-type-mappings (like say null in 
place of what's supposed to be a float in CSV), check if there's a default 
field in the schema passed and use it. Else throw an informative exception. I 
know this makes the 'default' meaning of the schema look wrong, but its a great 
feature to have!
  3. Write these records down into a data file.

Avro to CSV ->
 1. Pass a schema to read selective data. Else it reads the file with full 
schema.
 2. Read each record [only works with records for now] and convert all data to 
string type. Can read from many avro files into a csv file.
 3. Write to a csv file with an optional header.

Currently the code (WIP) resides on GitHub at: 
http://github.com/QwertyManiac/avroutils but I'll submit the stuff as a formal 
patch once it feels complete.

This comment is for gaining some suggestions. What to extend/etc.

> add tools that read/write CSV records from/to avro data files
> -------------------------------------------------------------
>
>                 Key: AVRO-458
>                 URL: https://issues.apache.org/jira/browse/AVRO-458
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Doug Cutting
>
> It might be useful to have command-line tools that can read & write arbitrary 
> CSV data from & to Avro data files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to