Joe, Using the CSV Headers to determine the schema is currently the only "dynamic" schema strategy, so it will be tricky to use with the other Readers/Writers and associated processors (which require an explicit schema). This should be alleviated with NIFI-3291 [1]. For this first release, I believe the approach would be to identify the various schemas for your incoming/outgoing data, create a Schema Registry with all of them, then the various Record Readers/Writers using those.
There were some issues during development related to using the incoming vs outgoing schema for various record operations, if QueryRecord seems to be using the output schema for querying then it is likely a bug. However in this case it might just be that you need an explicit schema for your Writer that matches the input schema (which is inferred from the CSV header). The CSV Header inference currently assumes all fields are Strings, so a nominal schema would have the same number of fields as columns, each with type String. If you don't know the number of columns and/or the column names are dynamic per CSV file, I believe we have a gap here (for now). I thought of maybe having a InferRecordSchema processor that would fill in the avro.text attribute for use in various downstream record readers/writers, but inferring schemas in general is not a trivial task. An easier interim solution might be to have an AddSchemaAsAttribute processor, which takes a Reader to parse the records and determine the schema (whether dynamic or static), and set the avro.text attribute on the original incoming flow file, then transfer the original flow file. This would require two reads, one by AddSchemaAsAttribute and one by the downstream record processor, but it should be fairly easy to implement. Then again, since new features would go into 1.3.0, hopefully NIFI-3921 will be implemented by then, rendering all this moot :) Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-3921 On Fri, May 19, 2017 at 12:25 PM, Joe Gresock <[email protected]> wrote: > I've tried a couple different configurations of CSVReader / > JsonRecordSetWriter with the QueryRecord processor, and I don't think I > quite have the usage down yet. > > Can someone give a basic example configuration in the following 2 > scenarios? I followed most of Matt Burgess's response to the post titled > "How to use ConvertRecord Processor", but I don't think it tells the whole > story. > > 1) QueryRecord, converting CSV to JSON, using only the CSV headers to > determine the schema. (I tried selecting Use String Fields from Header in > CSVReader, but the processor really seems to want to use the > JsonRecordSetWriter to determine the schema) > > 2) QueryRecord, converting CSV to JSON, using a cached avro schema. I > imagine I need to use InferAvroSchema here, but I'm not sure how to cache > it in the AvroSchemaRegistry. Also not quite sure how to configure the > properties of each controller service in this case. > > Any help would be appreciated. > > Joe > > -- > I know what it is to be in need, and I know what it is to have plenty. I > have learned the secret of being content in any and every situation, > whether well fed or hungry, whether living in plenty or in want. I can do > all this through him who gives me strength. *-Philippians 4:12-13*
