Jeff, I've answered inline. Thanks for using the processor, sorry it
isn't clear what's happening.
rb
On 11/05/2015 01:59 PM, Jeff wrote:
I built a simple flow that reads a tab separated file and attempts to convert
to Avro.
ConvertCSVtoAvro just says that the conversion failed.
Where can I find more information on what the failure was?
Information about failures is added to the "errors" attribute on files
emitted to the failure relationship. Unfortunately, right now the files
aren't filtered to just the failed rows. That's something we need to
fix, but it does accumulate error messages so you get something like:
"NumberFormatException: 'turkey' is not an integer (1,234 similar
errors)"
Using the same sample tab separated file, I create a JSON file out of it.
The JSON to Avro processor also fails with very little explication.
These processors are basically the same on the inside. :)
Same place for errors. I think the problem is likely that some of the
values are failing to convert to the Avro type you've selected.
With regard to the ConvertCSVtoAvro processor
Since my file is tab delimited, do I simple open the "CSV delimiter”
property, delete , and hit the tab key or is there a special syntax like ^t?
My data has no CSV quote character so do I leave this as “or delete it
or check the empty box?
This could definitely be a problem. The delimiter is what you want. It
works with both a tab character (I usually paste it in since the browser
uses it as a movement key) and with \t, though I think there's a bug
where you can't have 2-character delimiters in the validation. I should
fix that.
With regard to the ConvertJSONtoAvro
What is the expected JSON source file to look like?
[
{fields values … },
{fields values …}
]
Or
{fields values … }
{fields values …}
or something else.
This should be the second case. the JSON to Avro processor can't handle
JSON lists as the root just yet. You should simply concatenate JSON. The
whitespace doesn't matter.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.