Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3838#discussion_r133159049
  
    --- Diff: 
flink-libraries/flink-streaming-python/src/main/java/org/apache/flink/streaming/python/api/functions/PythonOutputSelector.java
 ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.python.api.functions;
    +
    +import org.apache.flink.streaming.api.collector.selector.OutputSelector;
    +import 
org.apache.flink.streaming.python.util.serialization.SerializationUtils;
    +import org.python.core.PyObject;
    +
    +import java.io.IOException;
    +
    +/**
    + * The {@code PythonOutputSelector} is a thin wrapper layer over a Python 
UDF {@code OutputSelector}.
    + * It receives an {@code OutputSelector} as an input and keeps it 
internally in a serialized form.
    + * It is then delivered, as part of the job graph, up to the TaskManager, 
then it is opened and becomes
    + * a sort of mediator to the Python UDF {@code OutputSelector}.
    + *
    + * <p>This function is used internally by the Python thin wrapper layer 
over the streaming data
    + * functionality</p>
    + */
    +public class PythonOutputSelector implements OutputSelector<PyObject> {
    +   private static final long serialVersionUID = 909266346633598177L;
    +
    +   private final byte[] serFun;
    +   private transient OutputSelector<PyObject> fun;
    +
    +   public PythonOutputSelector(OutputSelector<PyObject> fun) throws 
IOException {
    +           this.serFun = SerializationUtils.serializeObject(fun);
    +   }
    +
    +   @Override
    +   @SuppressWarnings("unchecked")
    +   public Iterable<String> select(PyObject value) {
    +           if (this.fun == null) {
    +                   try {
    +                           this.fun = (OutputSelector<PyObject>) 
SerializationUtils.deserializeObject(this.serFun);
    +                   } catch (IOException e) {
    +                           e.printStackTrace();
    --- End diff --
    
    I would throw a RuntimeException to fail the job. If the function can't be 
deserialized something went terribly wrong during deployment.
    
    Returning null would cause an NPE later on in `DirectedOutput`, obfuscating 
what actually caused the error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to