zhipeng93 commented on code in PR #242:
URL: https://github.com/apache/flink-ml/pull/242#discussion_r1226060309
##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
import java.io.Serializable;
-/** A vector of double values. */
+/** A vector representation of numbers. */
@TypeInfo(VectorTypeInfoFactory.class)
@PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends
Serializable {
Review Comment:
I think keeping `V extends Number` would make this more scalable. Though
Spark does not support float32 as value, but storing data in other data types
(e.g., float32) does save the space.
I searched the data types used in some machine learning libraries and math
lib.
- Breeze supports int-key and double/float/int/long values [1]. Breeze is
used by Spark ML, but Spark ML restricts the value type as double.
- Sklearn uses numpy array as input, the data types of numpy array could be
float16, single, double etc. [2]
- TensorFlow/PyTorch supports different data types of tensors [3]
[1]
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/DenseVector.scala
[2] https://numpy.org/doc/stable/user/basics.types.html
[3] https://www.tensorflow.org/guide/tensor#more_on_dtypes
##########
flink-ml-servable-core/src/main/java/org/apache/flink/ml/linalg/Vector.java:
##########
@@ -24,29 +24,29 @@
import java.io.Serializable;
-/** A vector of double values. */
+/** A vector representation of numbers. */
@TypeInfo(VectorTypeInfoFactory.class)
@PublicEvolving
-public interface Vector extends Serializable {
+public interface Vector<K extends Number, V extends Number> extends
Serializable {
Review Comment:
I think keeping `V extends Number` would make this interface more scalable.
Though Spark does not support float32 as value, but storing data in other data
types (e.g., float32) does save the space.
I searched the data types used in some machine learning libraries and math
lib.
- Breeze supports int-key and double/float/int/long values [1]. Breeze is
used by Spark ML, but Spark ML restricts the value type as double.
- Sklearn uses numpy array as input, the data types of numpy array could be
float16, single, double etc. [2]
- TensorFlow/PyTorch supports different data types of tensors [3]
[1]
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/DenseVector.scala
[2] https://numpy.org/doc/stable/user/basics.types.html
[3] https://www.tensorflow.org/guide/tensor#more_on_dtypes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]