Matt Burgess created NIFI-4109:
----------------------------------

             Summary: Implement an InferRecordSchema processor
                 Key: NIFI-4109
                 URL: https://issues.apache.org/jira/browse/NIFI-4109
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Extensions
            Reporter: Matt Burgess


Currently a record schema (for use in record-aware processors) must be provided 
by an attribute, a Schema Registry, or embedded in the flow file, and thus 
determined ahead of time. For formats that do not carry a schema (CSV, JSON, 
e.g.) and for flows whose files' schemas vary or are otherwise not known a 
priori, it would be helpful to have a processor to be able to infer the schema 
from the content. It could have any/all of the following features:

- Record-awareness: The existing InferAvroSchema can be used for CSV and JSON 
with non-record-aware processors/flows, although it does not currently support 
Avro logical types such as timestamp (see NIFI-3000). The benefit of 
record-awareness means better inference can be made by inspecting each record 
in a flowfile.
- Type inference: Should include the primitive types (numeric, string) as well 
as more complex types supported by Avro schemas (time, date, timestamp, etc.)
- Generate Schema in attribute: Recommend "avro.schema" be used as the output 
attribute, as this is the default for most RecordWriters.
- Publish Schema to Registry: This is an advanced feature that could be split 
out into its own Jira due to scope concerns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to