[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liang-Chi Hsieh updated SPARK-28365: ------------------------------------ Summary: Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM (was: Set default locale for StopWordsRemover tests to prevent invalid locale error during test) > Set default locale param for StopWordsRemover to en_US if system default > locale isn't in available locales in JVM > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-28365 > URL: https://issues.apache.org/jira/browse/SPARK-28365 > Project: Spark > Issue Type: Test > Components: ML, PySpark > Affects Versions: 3.0.0 > Reporter: Liang-Chi Hsieh > Priority: Minor > > Because the local default locale isn't in available locales at {{Locale}}, > when I did some tests locally with python code, {{StopWordsRemover}} related > python test hits some errors, like: > {code} > Traceback (most recent call last): > File "/spark-1/python/pyspark/ml/tests/test_feature.py", line 87, in > test_stopwordsremover > stopWordRemover = StopWordsRemover(inputCol="input", outputCol="output") > File "/spark-1/python/pyspark/__init__.py", line 111, in wrapper > return func(self, **kwargs) > File "/spark-1/python/pyspark/ml/feature.py", line 2646, in __init__ > self.uid) > File "/spark-1/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj > return java_obj(*java_args) > File /spark-1/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line > 1554, in __call__ > answer, self._gateway_client, None, self._fqn) > File "/spark-1/python/pyspark/sql/utils.py", line 93, in deco > raise converted > pyspark.sql.utils.IllegalArgumentException: 'StopWordsRemover_4598673ee802 > parameter locale given invalid value en_TW.' > {code} > As per [~hyukjin.kwon]'s advice, instead of setting up locale to pass test, > it is better to have a workable locale if system default locale can't be > found in available locales in JVM. Otherwise, users have to manually change > system locale or accessing a private property _jvm in PySpark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org