lindong28 commented on a change in pull request #10:
URL: https://github.com/apache/flink-ml/pull/10#discussion_r730194806
##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
}
@Override
- public void save(String path) throws IOException {
- throw new UnsupportedOperationException();
+ public Map<Param<?>, Object> getUserDefinedParamMap() {
+ return paramMap;
}
- public static Pipeline load(String path) throws IOException {
- throw new UnsupportedOperationException();
+ @Override
+ public void save(String path) throws IOException {
+ ReadWriteUtils.savePipeline(this, stages, path);
Review comment:
Just to clarify, the current implementation didn't assume the client can
access the stored model data. It just assumes the client can access the stored
model metadata such as class name and parameters, which, unlike model data, is
usually small in size and can be stored in a single file.
The PR does not try to save/load any model data. The work to save/load model
data will be left to the implementation of save/load of each `Stage` and is
supposed to happen during the Job Graph execution.
My understanding is that this concern is unrelated to this PR. This is the
metadata of the store has to be read from some external storage. If the client
can not access this storage, there is nothing this PR can do, right?
##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
}
@Override
- public void save(String path) throws IOException {
- throw new UnsupportedOperationException();
+ public Map<Param<?>, Object> getUserDefinedParamMap() {
+ return paramMap;
}
- public static Pipeline load(String path) throws IOException {
- throw new UnsupportedOperationException();
+ @Override
+ public void save(String path) throws IOException {
+ ReadWriteUtils.savePipeline(this, stages, path);
Review comment:
Just to clarify, the current implementation does not assume the client
can access the stored model data. It just assumes the client can access the
stored model metadata such as class name and parameters, which, unlike model
data, is usually small in size and can be stored in a single file.
The PR does not try to save/load any model data. The work to save/load model
data will be left to the implementation of save/load of each `Stage` and is
supposed to happen during the Job Graph execution.
My understanding is that this concern is unrelated to this PR. This is the
metadata of the store has to be read from some external storage. If the client
can not access this storage, there is nothing this PR can do, right?
##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
}
@Override
- public void save(String path) throws IOException {
- throw new UnsupportedOperationException();
+ public Map<Param<?>, Object> getUserDefinedParamMap() {
+ return paramMap;
}
- public static Pipeline load(String path) throws IOException {
- throw new UnsupportedOperationException();
+ @Override
+ public void save(String path) throws IOException {
+ ReadWriteUtils.savePipeline(this, stages, path);
Review comment:
Just to clarify, the current implementation does not assume the client
can access the stored model data. It just assumes the client can access the
stored model metadata such as class name and parameters, which, unlike model
data, is usually small in size and can be stored in a single file.
The PR does not try to save/load any model data. The work to save/load model
data will be left to the implementation of save/load of each `Stage` and is
supposed to happen during the Job Graph execution.
Note that the path to the metadata file can be a on a remote storage (e.g.
HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to
fetch file from those remote storage, as long as the client can access those
remote storage
My understanding is that this concern is unrelated to this PR. This is the
metadata of the store has to be read from some external storage. If the client
can not access this storage, there is nothing this PR can do, right?
##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
}
@Override
- public void save(String path) throws IOException {
- throw new UnsupportedOperationException();
+ public Map<Param<?>, Object> getUserDefinedParamMap() {
+ return paramMap;
}
- public static Pipeline load(String path) throws IOException {
- throw new UnsupportedOperationException();
+ @Override
+ public void save(String path) throws IOException {
+ ReadWriteUtils.savePipeline(this, stages, path);
Review comment:
Just to clarify, the current implementation does not assume the client
can access the stored model data. It just assumes the client can access the
stored model metadata such as class name and parameters, which, unlike model
data, is usually small in size and can be stored in a single file.
The PR does not try to save/load any model data. The work to save/load model
data will be left to the implementation of save/load of each `Stage` and is
supposed to happen during the Job Graph execution.
Note that the path to the metadata file can be a on a remote storage (e.g.
HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to
fetch file from those remote storage, as long as the client can access those
remote storage.
My understanding is that this concern is unrelated to this PR. This is the
metadata of the store has to be read from some external storage. If the client
can not access this storage, there is nothing this PR can do, right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]