weibozhao commented on code in PR #156:
URL: https://github.com/apache/flink-ml/pull/156#discussion_r992921852


##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java:
##########
@@ -47,10 +47,15 @@
 
 /**
  * A Transformer which combines a given list of input columns into a vector 
column. Types of input
- * columns must be either vector or numerical value.
+ * columns must be either vector or numerical types. The elements assembled in 
the same column must
+ * have the same size. The operator deals with null values or records with 
wrong sizes according to
+ * the strategy specified by the {@link HasHandleInvalid} parameter as follows:
  *
- * <p>The `keep` option of {@link HasHandleInvalid} means that we output bad 
rows with output column
- * set to null.
+ * <p>The `keep` option means that we do the assembling action without 
checking the vector size.

Review Comment:
   assembling [1,2], [3,4,5] with sizes 2,2 may get [1, 2, 3, 4] (trim to fit 
size)
   assembling [1,2], [3,4,5] with sizes 2,2 may get [1, 2, NaN, NaN]
   assembling [1,2], null with sizes 2,2 may get [1, 2, NaN, NaN]
   assembling [1,2], null with sizes 2,2 may get [1, 2, 0, 0]
   
   These situations will not occur. These vectors do not exist, at least not in 
the flink-ml code.
   
   assembling [1,2], [3,4] with sizes 2,3 may get [1, 2, 3, 4]
   assembling [1,2], [3,4] with sizes 2,3 may get [1, 2, 3, 4, 0] (padding with 
zeros)
   
   The assembling of two vectors is splicing, the result size is the sum of the 
vector sizes. This is common sense. It should be [1, 2, 3, 4, 0]. There is no 
ambiguity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to