Just adding to what Joey said, there was a previous discussion about something like this:
http://apache-nifi-developer-list.39713.n7.nabble.com/Looking-for-feedback-on-my-WIP-Design-td13097.html As far as the Avro schema, here is an example of how to get access to the schema from a stream call back: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/src/main/java/org/apache/nifi/processors/avro/ExtractAvroMetadata.java#L174-L177 On Thu, Dec 29, 2016 at 10:35 AM, Joey Frazee <[email protected]> wrote: > Michael, I think you’re right to call this out. I frequently find myself > stringing together flows with ExecuteScripts (which you should be able to > use to pull a schema out by creating an Avro DataFileStream from the > InputStream and then calling getSchema().toString()) or conversions to/from > JSON and Avro to handle all the scenarios. > > I think the heart of the solution shouldn’t just be the addition of an > output attribute including the schema, but something generic like you > mention in your (4), especially considering that there are at least 7 > issues [1-7] open for variations on this. Instead of a just a converter > processor, though, it’d probably be smart to make it some kind of > controller service so it can be easily exposed to other processors like > ExecuteSQL and QueryDatabaseTable. > > -joey > > 1. https://issues.apache.org/jira/browse/NIFI-2743 > 2. https://issues.apache.org/jira/browse/NIFI-1623 > 3. https://issues.apache.org/jira/browse/NIFI-1623 > 4. https://issues.apache.org/jira/browse/NIFI-1702 > 5. https://issues.apache.org/jira/browse/NIFI-1704 > 6. https://issues.apache.org/jira/browse/NIFI-1398 > 7. https://issues.apache.org/jira/browse/NIFI-2725 > > > On Dec 28, 2016, at 3:18 PM, Knapp, Michael < > [email protected]> wrote: > > > > Nifi Devs, > > > > I noticed you have two processors (ExecuteSQL and QueryDatabaseTable) > that perform SQL select statements and put the results into a flow file. > While I am not sure what their difference is, I did notice that they both > produce avro, and the schema is inferred from the result set. While the > schema is included in the output file’s contents, I am not sure of any easy > way to get that from a *StreamCallback. So I am wondering, > > > > > > 1. Could we update the processor to support multiple output > formats? I think CSV should definitely be supported. Parquet might also > be useful for me. JSON is an option but since you already have a > ConvertAvroToJSON processor that is not a big deal for me. > > > > 2. Could we update the processor to include the schema as one of > the output flow file attributes? > > > > 3. Is there any utility to get an avro schema from the input > stream callback? > > > > 4. Has anybody thought about writing a processor to convert Avro > to CSV? Or even something more generic than that, a generic format > conversion processor? It could support CSV, JSON, Avro, Parquet, XML, and > possibly others. > > > > Please let me know, > > > > Michael Knapp > > Capital One > > ________________________________________________________ > > > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > >
