[ https://issues.apache.org/jira/browse/SPARK-26458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730188#comment-16730188 ]
Marco Gaido commented on SPARK-26458: ------------------------------------- Which is the issue you are encountering? Can you provide a reproducer for your issue and the current and expected behavior? Thanks. > OneHotEncoderModel verifies the number of category values incorrectly when > tries to transform a dataframe. > ---------------------------------------------------------------------------------------------------------- > > Key: SPARK-26458 > URL: https://issues.apache.org/jira/browse/SPARK-26458 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.3.1 > Reporter: duruihuan > Priority: Major > > When the handleInvalid is set to "keep", then one should not compare the > categorySizes of the tranformSchema and the values of the metadata of the > dataframe to be transformed. Because there may be more than one invalid > values in some columns in the dataframe, which causes exception as described > in lines 302-306 in OneHotEncoderEstimator.scala. To be concluded, I think > the verifyNumOfValues in the method transformSchema should be removed, which > can be found in line 299 in the code. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org