[ 
https://issues.apache.org/jira/browse/FLINK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286328#comment-15286328
 ] 

ASF GitHub Bot commented on FLINK-3889:
---------------------------------------

Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1984#discussion_r63494540
  
    --- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FileSplitReadOperator.java
 ---
    @@ -0,0 +1,368 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.api.functions.source;
    +
    +import org.apache.flink.api.common.io.CheckpointableInputFormat;
    +import org.apache.flink.api.common.io.FileInputFormat;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.common.typeutils.TypeSerializer;
    +import org.apache.flink.api.java.tuple.Tuple2;
    +import org.apache.flink.api.java.tuple.Tuple3;
    +import org.apache.flink.configuration.Configuration;
    +import org.apache.flink.core.fs.FileInputSplit;
    +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
    +import org.apache.flink.runtime.state.AbstractStateBackend;
    +import org.apache.flink.runtime.state.StreamStateHandle;
    +import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
    +import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
    +import org.apache.flink.streaming.api.operators.TimestampedCollector;
    +import org.apache.flink.streaming.api.watermark.Watermark;
    +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
    +import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
    +import org.apache.flink.util.Preconditions;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.ObjectInputStream;
    +import java.io.ObjectOutputStream;
    +import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.LinkedList;
    +import java.util.List;
    +import java.util.Queue;
    +
    +import static org.apache.flink.util.Preconditions.checkNotNull;
    +
    +/**
    + * This is the operator that reads the splits received from {@link 
FileSplitMonitoringFunction}.
    + * This operator will receive just the split descriptors and then read and 
emit records. This may lead
    + * to backpressure. To avoid this, we will have another thread actually 
reading the splits and
    + * another forwarding the checkpoint barriers. The two should sync so that 
the checkpoints reflect the
    + * current state.
    + * */
    +public class FileSplitReadOperator<OUT, S extends Serializable> extends 
AbstractStreamOperator<OUT>
    +   implements OneInputStreamOperator<FileInputSplit, OUT> {
    +
    +   private static final Logger LOG = 
LoggerFactory.getLogger(FileSplitReadOperator.class);
    +
    +   private static final FileInputSplit EOF = new FileInputSplit(-1, null, 
-1, -1, null);
    +
    +   private transient SplitReader<S, OUT> reader;
    +   private transient TimestampedCollector<OUT> collector;
    +
    +   private Configuration configuration;
    +   private FileInputFormat<OUT> format;
    +   private TypeInformation<OUT> typeInfo;
    --- End diff --
    
    This is a very subtle thing but not all `TypeInformation` are 
`Serializable` and none of them should be. This is a problem that we introduced 
a while back.
    
    The way to do it is to implement `OutputTypeConfigurable`, there the 
`TypeSerializer` can be created. In `open()` you should then ensure that you 
actually have a `TypeSerializer`.
    
    And yes, I know that no-one can really know this without having encountered 
a serialization problem once ...  😅


> Make File Monitoring Function checkpointable.
> ---------------------------------------------
>
>                 Key: FLINK-3889
>                 URL: https://issues.apache.org/jira/browse/FLINK-3889
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Kostas Kloudas
>            Assignee: Kostas Kloudas
>
> This is essentially the combination of FLINK-3808 and FLINK-3717.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to