[ 
https://issues.apache.org/jira/browse/NIFI-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367946#comment-16367946
 ] 

ASF GitHub Bot commented on NIFI-4882:
--------------------------------------

Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2473#discussion_r168883276
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/AbstractCSVRecordReader.java
 ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.csv;
    +
    +
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.record.DataType;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.serialization.record.util.DataTypeUtils;
    +import org.apache.nifi.serialization.record.RecordFieldType;
    +import java.text.DateFormat;
    +import java.util.function.Supplier;
    +
    +abstract public class AbstractCSVRecordReader implements RecordReader {
    +
    +    protected final ComponentLog logger;
    +    protected final boolean hasHeader;
    +    protected final boolean ignoreHeader;
    +
    +    protected final Supplier<DateFormat> LAZY_DATE_FORMAT;
    +    protected final Supplier<DateFormat> LAZY_TIME_FORMAT;
    +    protected final Supplier<DateFormat> LAZY_TIMESTAMP_FORMAT;
    +
    +    protected final String dateFormat;
    +    protected final String timeFormat;
    +    protected final String timestampFormat;
    +
    +    protected final RecordSchema schema;
    +
    +    AbstractCSVRecordReader(final ComponentLog logger, final RecordSchema 
schema, final boolean hasHeader, final boolean ignoreHeader,
    +                            final String dateFormat, final String 
timeFormat, final String timestampFormat) {
    +        this.logger = logger;
    +        this.schema = schema;
    +        this.hasHeader = hasHeader;
    +        this.ignoreHeader = ignoreHeader;
    +
    +        if (dateFormat == null || dateFormat.isEmpty()) {
    +            this.dateFormat = RecordFieldType.DATE.getDefaultFormat();
    +            LAZY_DATE_FORMAT = () -> null;
    --- End diff --
    
    @derekstraka Thanks for updating this. I'm wondering if this is correct. If 
I understand it correctly, `this.dateFormat` is used to check compatibility, 
but actual data conversion is done with `LAZY_DATE_FORMAT`. 
    And [DataTypeUtil will try to convert String value as Long if this Supplier 
returns 
null](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L511).
    
    If so, when `dataFormat` is null or empty and input value is '2018-02-17', 
result would be a NumberFormatException? The unit test covers default 
'yyyy-MM-dd' format and custom format case, but does not have the case where 
null/empty format is specified. Would you add that one, too?
    
    Also, I wonder how we should treat data in unix epoch representation. By 
looking at DataTypeUtils.isDateTypeCompatible method, it actually checks if a 
[String is compatible with Integer when format is 
null](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L535).
    
    From above observation, I think we should set `null` to `this.dataFormat` 
if `dateFormat` argument is null or empty, instead of using default format, so 
that it can be treated as numeric representation of a date.
    
    How do you think?


> CSVRecordReader should utilize specified date/time/timestamp format at its 
> convertSimpleIfPossible method
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4882
>                 URL: https://issues.apache.org/jira/browse/NIFI-4882
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Koji Kawamura
>            Assignee: Derek Straka
>            Priority: Major
>
> CSVRecordReader.convertSimpleIfPossible method is used by ValidateRecord. The 
> method does not coerce values to the target schema field type if the raw 
> string representation in the input CSV file is not compatible.
> The type compatibility check is implemented as follows. But it does not use 
> user specified date/time/timestamp format:
> {code}
>                 // This will return 'false' for input '01/01/1900' when user 
> specified custom format 'MM/dd/YYYY'
>                 if (DataTypeUtils.isCompatibleDataType(trimmed, dataType)) {
>                     // The LAZY_DATE_FORMAT should be used to check 
> compatibility, too.
>                     return DataTypeUtils.convertType(trimmed, dataType, 
> LAZY_DATE_FORMAT, LAZY_TIME_FORMAT, LAZY_TIMESTAMP_FORMAT, fieldName);
>                 } else {
>                     return value;
>                 }
> {code}
> If input date strings have different format than the default format 
> 'yyyy-MM-dd', then ValidateRecord processor can not validate input records.
> JacksonCSVRecordReader has the identical methods with CSVRecordReader. Those 
> classes should have an abstract class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to