[GitHub] [flink-ml] lindong28 commented on a change in pull request #10: [FLINK-24354][FLIP-174] Improve the WithParams interface

GitBox Sat, 16 Oct 2021 17:30:00 -0700


lindong28 commented on a change in pull request #10:
URL: https://github.com/apache/flink-ml/pull/10#discussion_r730194806




##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation didn't assume the client can 
access the stored model data. It just assumes the client can access the stored 
model metadata such as class name and parameters, which, unlike model data, is 
usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model 
data will be left to the implementation of save/load of each `Stage` and is 
supposed to happen during the Job Graph execution.
   
   My understanding is that  this concern is unrelated to this PR. This is the 
metadata of the store has to be read from some external storage. If the client 
can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client 
can access the stored model data. It just assumes the client can access the 
stored model metadata such as class name and parameters, which, unlike model 
data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model 
data will be left to the implementation of save/load of each `Stage` and is 
supposed to happen during the Job Graph execution.
   
   My understanding is that  this concern is unrelated to this PR. This is the 
metadata of the store has to be read from some external storage. If the client 
can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client 
can access the stored model data. It just assumes the client can access the 
stored model metadata such as class name and parameters, which, unlike model 
data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model 
data will be left to the implementation of save/load of each `Stage` and is 
supposed to happen during the Job Graph execution.
   
   Note that the path to the metadata file can be a on a remote storage (e.g. 
HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to 
fetch file from those remote storage, as long as the client can access those 
remote storage
   
   My understanding is that  this concern is unrelated to this PR. This is the 
metadata of the store has to be read from some external storage. If the client 
can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client 
can access the stored model data. It just assumes the client can access the 
stored model metadata such as class name and parameters, which, unlike model 
data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model 
data will be left to the implementation of save/load of each `Stage` and is 
supposed to happen during the Job Graph execution.
   
   Note that the path to the metadata file can be a on a remote storage (e.g. 
HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to 
fetch file from those remote storage, as long as the client can access those 
remote storage.
   
   My understanding is that  this concern is unrelated to this PR. This is the 
metadata of the store has to be read from some external storage. If the client 
can not access this storage, there is nothing this PR can do, right?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-ml] lindong28 commented on a change in pull request #10: [FLINK-24354][FLIP-174] Improve the WithParams interface

Reply via email to