[
https://issues.apache.org/jira/browse/SPARK-48837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-48837.
-----------------------------------
Resolution: Fixed
Issue resolved by pull request 47258
[https://github.com/apache/spark/pull/47258]
> In CountVectorizer, only read binary parameter once per transform, not once
> per row
> -----------------------------------------------------------------------------------
>
> Key: SPARK-48837
> URL: https://issues.apache.org/jira/browse/SPARK-48837
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.0.0
> Reporter: Josh Rosen
> Assignee: Josh Rosen
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-13629 added a binary parameter to CountVectorizer, but due to the way
> the code is structured the configuration parameter is read once-per-row in a
> UDF. Instead, we should read it once-per-transform call (similar to how other
> parameters are read).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]