Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2473#discussion_r168883276
--- Diff:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/AbstractCSVRecordReader.java
---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.csv;
+
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+import org.apache.nifi.serialization.record.RecordFieldType;
+import java.text.DateFormat;
+import java.util.function.Supplier;
+
+abstract public class AbstractCSVRecordReader implements RecordReader {
+
+ protected final ComponentLog logger;
+ protected final boolean hasHeader;
+ protected final boolean ignoreHeader;
+
+ protected final Supplier<DateFormat> LAZY_DATE_FORMAT;
+ protected final Supplier<DateFormat> LAZY_TIME_FORMAT;
+ protected final Supplier<DateFormat> LAZY_TIMESTAMP_FORMAT;
+
+ protected final String dateFormat;
+ protected final String timeFormat;
+ protected final String timestampFormat;
+
+ protected final RecordSchema schema;
+
+ AbstractCSVRecordReader(final ComponentLog logger, final RecordSchema
schema, final boolean hasHeader, final boolean ignoreHeader,
+ final String dateFormat, final String
timeFormat, final String timestampFormat) {
+ this.logger = logger;
+ this.schema = schema;
+ this.hasHeader = hasHeader;
+ this.ignoreHeader = ignoreHeader;
+
+ if (dateFormat == null || dateFormat.isEmpty()) {
+ this.dateFormat = RecordFieldType.DATE.getDefaultFormat();
+ LAZY_DATE_FORMAT = () -> null;
--- End diff --
@derekstraka Thanks for updating this. I'm wondering if this is correct. If
I understand it correctly, `this.dateFormat` is used to check compatibility,
but actual data conversion is done with `LAZY_DATE_FORMAT`.
And [DataTypeUtil will try to convert String value as Long if this Supplier
returns
null](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L511).
If so, when `dataFormat` is null or empty and input value is '2018-02-17',
result would be a NumberFormatException? The unit test covers default
'yyyy-MM-dd' format and custom format case, but does not have the case where
null/empty format is specified. Would you add that one, too?
Also, I wonder how we should treat data in unix epoch representation. By
looking at DataTypeUtils.isDateTypeCompatible method, it actually checks if a
[String is compatible with Integer when format is
null](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L535).
From above observation, I think we should set `null` to `this.dataFormat`
if `dateFormat` argument is null or empty, instead of using default format, so
that it can be treated as numeric representation of a date.
How do you think?
---