[
https://issues.apache.org/jira/browse/NIFI-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367834#comment-16367834
]
ASF GitHub Bot commented on NIFI-4882:
--------------------------------------
Github user derekstraka commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2473#discussion_r168861958
--- Diff:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/AbstractCSVRecordReader.java
---
@@ -0,0 +1,110 @@
+package org.apache.nifi.csv;
+
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+import org.apache.nifi.serialization.record.RecordFieldType;
+import java.text.DateFormat;
+import java.util.function.Supplier;
+
+abstract public class AbstractCSVRecordReader implements RecordReader {
+
+ protected final ComponentLog logger;
+ protected final boolean hasHeader;
+ protected final boolean ignoreHeader;
+
+ protected final Supplier<DateFormat> LAZY_DATE_FORMAT;
+ protected final Supplier<DateFormat> LAZY_TIME_FORMAT;
+ protected final Supplier<DateFormat> LAZY_TIMESTAMP_FORMAT;
+
+ protected final String dateFormat;
+ protected final String timeFormat;
+ protected final String timestampFormat;
+
+ protected final RecordSchema schema;
+
+ AbstractCSVRecordReader(final ComponentLog logger, final RecordSchema
schema, final boolean hasHeader, final boolean ignoreHeader,
+ final String dateFormat, final String
timeFormat, final String timestampFormat) {
+ this.logger = logger;
+ this.schema = schema;
+ this.hasHeader = hasHeader;
+ this.ignoreHeader = ignoreHeader;
+ this.dateFormat = (dateFormat == null) ?
RecordFieldType.DATE.getDefaultFormat() : dateFormat;
+ this.timeFormat = (timeFormat == null) ?
RecordFieldType.TIME.getDefaultFormat(): timeFormat;
+ this.timestampFormat = (timestampFormat == null) ?
RecordFieldType.TIMESTAMP.getDefaultFormat(): timestampFormat;
+
+ final DateFormat df = (dateFormat == null) ? null :
DataTypeUtils.getDateFormat(dateFormat);
+ final DateFormat tf = (timeFormat == null) ? null :
DataTypeUtils.getDateFormat(timeFormat);
+ final DateFormat tsf = (timestampFormat == null) ? null :
DataTypeUtils.getDateFormat(timestampFormat);
--- End diff --
It can come in null if the data is not set, but you are correct, it could
also be the empty string. The empty string should use the default format as
well. I'll re-work this a bit.
> CSVRecordReader should utilize specified date/time/timestamp format at its
> convertSimpleIfPossible method
> ---------------------------------------------------------------------------------------------------------
>
> Key: NIFI-4882
> URL: https://issues.apache.org/jira/browse/NIFI-4882
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Koji Kawamura
> Assignee: Derek Straka
> Priority: Major
>
> CSVRecordReader.convertSimpleIfPossible method is used by ValidateRecord. The
> method does not coerce values to the target schema field type if the raw
> string representation in the input CSV file is not compatible.
> The type compatibility check is implemented as follows. But it does not use
> user specified date/time/timestamp format:
> {code}
> // This will return 'false' for input '01/01/1900' when user
> specified custom format 'MM/dd/YYYY'
> if (DataTypeUtils.isCompatibleDataType(trimmed, dataType)) {
> // The LAZY_DATE_FORMAT should be used to check
> compatibility, too.
> return DataTypeUtils.convertType(trimmed, dataType,
> LAZY_DATE_FORMAT, LAZY_TIME_FORMAT, LAZY_TIMESTAMP_FORMAT, fieldName);
> } else {
> return value;
> }
> {code}
> If input date strings have different format than the default format
> 'yyyy-MM-dd', then ValidateRecord processor can not validate input records.
> JacksonCSVRecordReader has the identical methods with CSVRecordReader. Those
> classes should have an abstract class.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)