[GitHub] [flink] afedulov commented on a change in pull request #17598: [WIP][FLINK-24703][connectors][formats] Add FileSource support for reading CSV files.

GitBox Tue, 16 Nov 2021 03:18:02 -0800


afedulov commented on a change in pull request #17598:
URL: https://github.com/apache/flink/pull/17598#discussion_r750166974




##########
File path: 
flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/DelimitedFormat.java
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.Scanner;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads blocks of bytes from a file. The blocks are 
separated by the specified
+ * delimiting pattern and deserialized using the provided deserialization 
schema.
+ *
+ * <p>The reader uses Java's built-in {@link Scanner} to decode the byte 
stream using various
+ * supported charset encodings.
+ *
+ * <p>This format does not support optimized recovery from checkpoints. On 
recovery, it will re-read
+ * and discard the number of lined that were processed before the last 
checkpoint. That is due to
+ * the fact that the offsets of lines in the file cannot be tracked through 
the charset decoders
+ * with their internal buffering of stream input and charset decoder state.
+ */
+@PublicEvolving
+public class DelimitedFormat<T> extends SimpleStreamFormat<T> {

Review comment:
       This is an obsolete approach, please ignore. This PR is still WIP and 
not ready for review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] afedulov commented on a change in pull request #17598: [WIP][FLINK-24703][connectors][formats] Add FileSource support for reading CSV files.

Reply via email to