Jeff, I've answered inline. Thanks for using the processor, sorry it isn't clear what's happening.

rb

On 11/05/2015 01:59 PM, Jeff wrote:
I built a simple flow that reads a tab separated file and attempts to convert 
to Avro.

ConvertCSVtoAvro just says that the conversion failed.

Where can I find more information on what the failure was?

Information about failures is added to the "errors" attribute on files emitted to the failure relationship. Unfortunately, right now the files aren't filtered to just the failed rows. That's something we need to fix, but it does accumulate error messages so you get something like:

"NumberFormatException: 'turkey' is not an integer (1,234 similar errors)"

Using the same sample tab separated file, I create a JSON file out of it.

The JSON to Avro processor also fails with very little explication.

These processors are basically the same on the inside. :)

Same place for errors. I think the problem is likely that some of the values are failing to convert to the Avro type you've selected.


With regard to the ConvertCSVtoAvro processor
        Since my file is tab  delimited, do I simple open the "CSV delimiter” 
property, delete , and hit the tab key or is there a special syntax like ^t?
        My data has no CSV quote character so do I leave this as “or delete it 
or check the empty box?

This could definitely be a problem. The delimiter is what you want. It works with both a tab character (I usually paste it in since the browser uses it as a movement key) and with \t, though I think there's a bug where you can't have 2-character delimiters in the validation. I should fix that.

With regard to the ConvertJSONtoAvro
        What is the expected JSON source file to look like?
                [
                 {fields values … },
                 {fields values …}
                ]
        Or
                 {fields values … }
                 {fields values …}
        or something else.

This should be the second case. the JSON to Avro processor can't handle JSON lists as the root just yet. You should simply concatenate JSON. The whitespace doesn't matter.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to