[ 
https://issues.apache.org/jira/browse/NIFI-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096461#comment-16096461
 ] 

ASF GitHub Bot commented on NIFI-4142:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2015#discussion_r128806560
  
    --- Diff: 
nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java
 ---
    @@ -38,14 +38,35 @@
     public interface RecordReader extends Closeable {
     
         /**
    -     * Returns the next record in the stream or <code>null</code> if no 
more records are available.
    +     * Returns the next record in the stream or <code>null</code> if no 
more records are available. Schema enforcement will be enabled.
          *
          * @return the next record in the stream or <code>null</code> if no 
more records are available.
          *
          * @throws IOException if unable to read from the underlying data
          * @throws MalformedRecordException if an unrecoverable failure occurs 
when trying to parse a record
    +     * @throws SchemaValidationException if a Record contains a field that 
violates the schema and cannot be coerced into the appropriate field type.
          */
    -    Record nextRecord() throws IOException, MalformedRecordException;
    +    default Record nextRecord() throws IOException, 
MalformedRecordException {
    +        return nextRecord(true);
    +    }
    +
    +    /**
    +     * Reads the next record from the underlying stream. If schema 
enforcement is enabled, then any field in the Record whose type does not
    +     * match the schema will be coerced to the correct type and a 
MalformedRecordException will be thrown if unable to coerce the data into
    +     * the correct type. If schema enforcement is disabled, then no type 
coercion will occur. As a result, calling
    +     * {@link 
Record#getValue(org.apache.nifi.serialization.record.RecordField)}
    +     * may return any type of Object, such as a String or another Record, 
even though the schema indicates that the field must be an integer.
    +     *
    +     * @param enforceSchema whether or not fields in the Record should be 
validated against the schema and coerced when necessary
    +     *
    +     * @return the next record in the stream or <code>null</code> if no 
more records are available
    +     * @throws IOException if unable to read from the underlying data
    +     * @throws MalformedRecordException if an unrecoverable failure occurs 
when trying to parse a record, or a Record contains a field
    +     *             that violates the schema and cannot be coerced into the 
appropriate field type.
    +     * @throws SchemaValidationException if a Record contains a field that 
violates the schema and cannot be coerced into the appropriate
    +     *             field type and schema enforcement is enabled
    +     */
    +    Record nextRecord(boolean enforceSchema) throws IOException, 
MalformedRecordException;
    --- End diff --
    
    I think I actually want to just separate the concept out into two different 
variables here: boolean coerceTypes, boolean dropUnknownRecords. That way it is 
very explicit what is happening, and I don't think that 'strict' vs. 'lenient' 
really conveys those two semantics as well as I'd like.


> Implement a ValidateRecord Processor
> ------------------------------------
>
>                 Key: NIFI-4142
>                 URL: https://issues.apache.org/jira/browse/NIFI-4142
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.4.0
>
>
> We need a processor that is capable of validating that all Records in a 
> FlowFile adhere to the proper schema.
> The Processor should be configured with a Record Reader and should route each 
> record to either 'valid' or 'invalid' based on whether or not the record 
> adheres to the reader's schema. A record would be invalid in any of the 
> following cases:
> - Missing field that is required according to the schema
> - Extra field that is not present in schema (it should be configurable 
> whether or not this is a failure)
> - Field requires coercion and strict type checking enabled (this should also 
> be configurable)
> - Field is invalid, such as the value "hello" when it should be an integer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to