yunfengzhou-hub commented on code in PR #156:
URL: https://github.com/apache/flink-ml/pull/156#discussion_r991744425
##########
flink-ml-python/pyflink/ml/lib/feature/vectorassembler.py:
##########
@@ -31,17 +33,38 @@ class _VectorAssemblerParams(
Params for :class:`VectorAssembler`.
"""
+ INPUT_SIZES: Param[Tuple[int, ...]] = IntArrayParam(
+ "input_sizes",
+ "Sizes of the assembling elements.",
Review Comment:
Let's keep the description the same as that in Java.
##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java:
##########
@@ -47,10 +47,15 @@
/**
* A Transformer which combines a given list of input columns into a vector
column. Types of input
- * columns must be either vector or numerical value.
+ * columns must be either vector or numerical types. The elements assembled in
the same column must
+ * have the same size. If the element is null or has the wrong size, we will
process this case with
Review Comment:
Let's avoid writing the JavaDoc using the first-person perspective. For
example, instead of saying "we will process", let's say "the operator deals
with null values or records with wrong sizes according to the strategy
specified by the {@link HasHandleInvalid} parameter as follows".
##########
docs/content/docs/operators/feature/vectorassembler.md:
##########
@@ -44,11 +50,12 @@ Types of input columns must be either vector or numerical
value.
### Parameters
-| Key | Default | Type | Required | Description
|
-|---------------|------------|----------|----------|--------------------------------------------------------------------------------|
-| inputCols | `null` | String[] | yes | Input column names.
|
-| outputCol | `"output"` | String | no | Output column name.
|
-| handleInvalid | `"error"` | String | no | Strategy to handle
invalid entries. Supported values: 'error', 'skip', 'keep'. |
+| Key | Default | Type | Required | Description
|
+|-----------------|------------|-----------|----------|--------------------------------------------------------------------------------|
+| inputCols | `null` | String[] | yes | Input column names.
|
+| outputCol | `"output"` | String | no | Output column name.
|
+| inputSizes | `null` | Integer[] | yes | Sizes of the
assembling elements. |
Review Comment:
Let's keep the description the same as that in Java.
##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java:
##########
@@ -47,10 +47,15 @@
/**
* A Transformer which combines a given list of input columns into a vector
column. Types of input
- * columns must be either vector or numerical value.
+ * columns must be either vector or numerical types. The elements assembled in
the same column must
+ * have the same size. If the element is null or has the wrong size, we will
process this case with
+ * {@link HasHandleInvalid} parameter as follows:
*
- * <p>The `keep` option of {@link HasHandleInvalid} means that we output bad
rows with output column
- * set to null.
+ * <p>The `keep` option means that we do not check the vector size, and keep
all rows.
Review Comment:
If the size does not match the expected, what would be the output value of
the row?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]