[
https://issues.apache.org/jira/browse/NIFI-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096461#comment-16096461
]
ASF GitHub Bot commented on NIFI-4142:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2015#discussion_r128806560
--- Diff:
nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java
---
@@ -38,14 +38,35 @@
public interface RecordReader extends Closeable {
/**
- * Returns the next record in the stream or <code>null</code> if no
more records are available.
+ * Returns the next record in the stream or <code>null</code> if no
more records are available. Schema enforcement will be enabled.
*
* @return the next record in the stream or <code>null</code> if no
more records are available.
*
* @throws IOException if unable to read from the underlying data
* @throws MalformedRecordException if an unrecoverable failure occurs
when trying to parse a record
+ * @throws SchemaValidationException if a Record contains a field that
violates the schema and cannot be coerced into the appropriate field type.
*/
- Record nextRecord() throws IOException, MalformedRecordException;
+ default Record nextRecord() throws IOException,
MalformedRecordException {
+ return nextRecord(true);
+ }
+
+ /**
+ * Reads the next record from the underlying stream. If schema
enforcement is enabled, then any field in the Record whose type does not
+ * match the schema will be coerced to the correct type and a
MalformedRecordException will be thrown if unable to coerce the data into
+ * the correct type. If schema enforcement is disabled, then no type
coercion will occur. As a result, calling
+ * {@link
Record#getValue(org.apache.nifi.serialization.record.RecordField)}
+ * may return any type of Object, such as a String or another Record,
even though the schema indicates that the field must be an integer.
+ *
+ * @param enforceSchema whether or not fields in the Record should be
validated against the schema and coerced when necessary
+ *
+ * @return the next record in the stream or <code>null</code> if no
more records are available
+ * @throws IOException if unable to read from the underlying data
+ * @throws MalformedRecordException if an unrecoverable failure occurs
when trying to parse a record, or a Record contains a field
+ * that violates the schema and cannot be coerced into the
appropriate field type.
+ * @throws SchemaValidationException if a Record contains a field that
violates the schema and cannot be coerced into the appropriate
+ * field type and schema enforcement is enabled
+ */
+ Record nextRecord(boolean enforceSchema) throws IOException,
MalformedRecordException;
--- End diff --
I think I actually want to just separate the concept out into two different
variables here: boolean coerceTypes, boolean dropUnknownRecords. That way it is
very explicit what is happening, and I don't think that 'strict' vs. 'lenient'
really conveys those two semantics as well as I'd like.
> Implement a ValidateRecord Processor
> ------------------------------------
>
> Key: NIFI-4142
> URL: https://issues.apache.org/jira/browse/NIFI-4142
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 1.4.0
>
>
> We need a processor that is capable of validating that all Records in a
> FlowFile adhere to the proper schema.
> The Processor should be configured with a Record Reader and should route each
> record to either 'valid' or 'invalid' based on whether or not the record
> adheres to the reader's schema. A record would be invalid in any of the
> following cases:
> - Missing field that is required according to the schema
> - Extra field that is not present in schema (it should be configurable
> whether or not this is a failure)
> - Field requires coercion and strict type checking enabled (this should also
> be configurable)
> - Field is invalid, such as the value "hello" when it should be an integer
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)