lindong28 commented on code in PR #242:
URL: https://github.com/apache/flink-ml/pull/242#discussion_r1225027030


##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {

Review Comment:
   Given that we only need to store values as double would it be simpler to 
remove `V extends Number`?
   
   We can extend the interface to support using float as the value type. But 
given that Spark ML does not provide this optimization, it is not clear 
whether/when we will ever need it.



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {
 
     /** Gets the size of the vector. */
-    int size();
+    K size();
 
     /** Gets the value of the ith element. */
-    double get(int i);
+    V get(K i);
 
     /** Sets the value of the ith element. */
-    void set(int i, double value);
+    void set(K i, V value);

Review Comment:
   Would it be simpler to use `void set(long i, double value)`?



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {
 
     /** Gets the size of the vector. */
-    int size();
+    K size();

Review Comment:
   Would it be simpler to use `long size()`?



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {
 
     /** Gets the size of the vector. */
-    int size();
+    K size();
 
     /** Gets the value of the ith element. */
-    double get(int i);
+    V get(K i);
 
     /** Sets the value of the ith element. */
-    void set(int i, double value);
+    void set(K i, V value);
 
-    /** Converts the instance to a double array. */
-    double[] toArray();
+    /** Converts the instance to a primitive array. */
+    Object toArray();
 
     /** Converts the instance to a dense vector. */
-    DenseVector toDense();
+    Vector<K, V> toDense();
 
     /** Converts the instance to a sparse vector. */
-    SparseVector toSparse();
+    Vector<K, V> toSparse();

Review Comment:
   The intention of defining this method in an interface is to have the caller 
use the return value via public APIs of `Vector<K, V>`.
   
   However, it appears that caller of this method uses return values via public 
fields/APIs that do not belong to `Vector<K, V>`. For example, 
`MinHashLSHModelData#hashFunction()` accesses the field 
`SparseIntDoubleVector#indices` via `vec.toSparse().indices`. The code will 
break if we change the concrete class from `SparseIntDoubleVector` to 
`SparseLongDoubleVector`.



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {
 
     /** Gets the size of the vector. */
-    int size();
+    K size();
 
     /** Gets the value of the ith element. */
-    double get(int i);
+    V get(K i);

Review Comment:
   Would it be simpler to just use `double get(long i)`?



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {
 
     /** Gets the size of the vector. */
-    int size();
+    K size();
 
     /** Gets the value of the ith element. */
-    double get(int i);
+    V get(K i);
 
     /** Sets the value of the ith element. */
-    void set(int i, double value);
+    void set(K i, V value);
 
-    /** Converts the instance to a double array. */
-    double[] toArray();
+    /** Converts the instance to a primitive array. */
+    Object toArray();

Review Comment:
   The implementation of this API always return `double[]`. Would it be simpler 
to still use `double[] toArray()`?



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/SparseLongDoubleVector.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.linalg;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.typeinfo.TypeInfo;
+import 
org.apache.flink.ml.linalg.typeinfo.SparseLongDoubleVectorTypeInfoFactory;
+import org.apache.flink.util.Preconditions;
+
+import java.util.Arrays;
+
+/**
+ * A sparse vector with long as keys and double as values.
+ *
+ * <p>TODO: Add processing logic for {@link SparseLongDoubleVector} for 
existing algorithms.
+ */
+@TypeInfo(SparseLongDoubleVectorTypeInfoFactory.class)
+@PublicEvolving
+public class SparseLongDoubleVector implements LongDoubleVector {
+
+    public final long n;
+    public long[] indices;

Review Comment:
   Would it be useful to make these fields private and expose them via methods 
(e.g. `getIndices()` or `getIndicesAsLongArray()`) so that we can use the same 
caller code for different implementations of SparseVector?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to