zhipeng93 commented on code in PR #237: URL: https://github.com/apache/flink-ml/pull/237#discussion_r1209245957
########## flink-ml-lib/src/main/java/org/apache/flink/ml/common/updater/ModelUpdater.java: ########## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.common.updater; + +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.runtime.state.StateInitializationContext; +import org.apache.flink.runtime.state.StateSnapshotContext; + +import java.io.Serializable; +import java.util.Iterator; + +/** + * A model updater that could be used to handle push/pull request from workers. + * + * <p>Note that model updater should also ensure that model data is robust to failures. + */ +public interface ModelUpdater extends Serializable { + + /** Initialize the model data. */ + void open(long startFeatureIndex, long endFeatureIndex); + + /** Applies the push to update the model data, e.g., using gradient to update model. */ + void handlePush(long[] keys, double[] values); + + /** Applies the pull and return the retrieved model data. */ + double[] handlePull(long[] keys); Review Comment: In this PR, we propose to use two type of roles to describe the iterative machine learning training process following the idea of parameter servers. - WorkerOp stores the training data and only involves local computation logic. When it needs to access model parameters and involves distributed communication, it communicates with ServerOp via `push/pull` primitive. The `push/pull` could be sparse key-value pairs or dense values. Currently only sparse key-value are supported. - ServerOp stores the model parameters and provide access to WorkerOps. - Subtasks of WorkerOp cannot talk to each other. Subtasks of ServerOp cannot talk to each other. `handlePush` and `handlePull` are two operations that the server answers the request from workers. The naming follows the name of `push/pull`. It is possible that `handlePush` handle keys that have been updated with `handlePush`, but not necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org