zhipeng93 commented on code in PR #242:
URL: https://github.com/apache/flink-ml/pull/242#discussion_r1226060309


##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {

Review Comment:
   I think keeping `V extends Number` would make this more scalable. Though 
Spark does not support float32 as value, but storing data in other data types 
(e.g., float32) does save the space. 
   
   I searched the data types used in some machine learning libraries and math 
lib.
   - Breeze supports int-key and double/float/int/long values [1]. Breeze is 
used by Spark ML, but Spark ML restricts the value type as double.
   - Sklearn uses numpy array as input, the data types of numpy array could be 
float16, single, double etc. [2]
   - TensorFlow/PyTorch supports different data types of tensors [3]
   
   [1] 
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/DenseVector.scala
   [2] https://numpy.org/doc/stable/user/basics.types.html
   [3] https://www.tensorflow.org/guide/tensor#more_on_dtypes



##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
 
 import java.io.Serializable;
 
-/** A vector of double values. */
+/** A vector representation of numbers. */
 @TypeInfo(VectorTypeInfoFactory.class)
 @PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends 
Serializable {

Review Comment:
   I think keeping `V extends Number` would make this interface more scalable. 
Though Spark does not support float32 as value, but storing data in other data 
types (e.g., float32) does save the space. 
   
   I searched the data types used in some machine learning libraries and math 
lib.
   - Breeze supports int-key and double/float/int/long values [1]. Breeze is 
used by Spark ML, but Spark ML restricts the value type as double.
   - Sklearn uses numpy array as input, the data types of numpy array could be 
float16, single, double etc. [2]
   - TensorFlow/PyTorch supports different data types of tensors [3]
   
   [1] 
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/DenseVector.scala
   [2] https://numpy.org/doc/stable/user/basics.types.html
   [3] https://www.tensorflow.org/guide/tensor#more_on_dtypes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to