[ https://issues.apache.org/jira/browse/SPARK-41548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-41548: --------------------------------- Description: There are failures in `test_connect_functions` with ANSI mode on (https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223). I tried to fix but they are tricky to fix because Spark Connect does not respect the runtime configuration at the server side. It is also tricky to fix the test to pass in both ANSI mode on and off. Therefore, it disables temporarily to make other tests pass. Note that PySpark tests stop in the middle if one fails. {code:java} ====================================================================== 1322ERROR [0.264s]: test_date_ts_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) 1323---------------------------------------------------------------------- 1324Traceback (most recent call last): 1325 File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 1149, in test_date_ts_functions 1326 cdf.select(cfunc(cdf.ts1)).toPandas(), 1327 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas 1328 return self._session.client._to_pandas(query) 1329 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas 1330 return self._execute_and_fetch(req) 1331 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch 1332 for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()): 1333 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__ 1334 return self._next() 1335 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next 1336 raise self 1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: 1338 status = StatusCode.UNKNOWN 1339 details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error." 1340 debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:15.71844837+00:00"}" 1341> 1342 1343====================================================================== 1344ERROR [0.527s]: test_string_functions_one_arg (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) 1345---------------------------------------------------------------------- 1346Traceback (most recent call last): 1347 File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 985, in test_string_functions_one_arg 1348 cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(), 1349 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas 1350 return self._session.client._to_pandas(query) 1351 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas 1352 return self._execute_and_fetch(req) 1353 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch 1354 for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()): 1355 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__ 1356 return self._next() 1357 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next 1358 raise self 1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: 1360 status = StatusCode.UNKNOWN 1361 details = "[CAST_INVALID_INPUT] The value ' ab ' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error." 1362 debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \' ab \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}" 1363> 1364 1365---------------------------------------------------------------------- 1366Ran 14 tests in 40.832s {code} was: There are too many failures in test_connect_functions with ANSI mode on, see [https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223] {code:java} ====================================================================== 1322ERROR [0.264s]: test_date_ts_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) 1323---------------------------------------------------------------------- 1324Traceback (most recent call last): 1325 File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 1149, in test_date_ts_functions 1326 cdf.select(cfunc(cdf.ts1)).toPandas(), 1327 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas 1328 return self._session.client._to_pandas(query) 1329 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas 1330 return self._execute_and_fetch(req) 1331 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch 1332 for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()): 1333 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__ 1334 return self._next() 1335 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next 1336 raise self 1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: 1338 status = StatusCode.UNKNOWN 1339 details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error." 1340 debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:15.71844837+00:00"}" 1341> 1342 1343====================================================================== 1344ERROR [0.527s]: test_string_functions_one_arg (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) 1345---------------------------------------------------------------------- 1346Traceback (most recent call last): 1347 File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 985, in test_string_functions_one_arg 1348 cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(), 1349 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas 1350 return self._session.client._to_pandas(query) 1351 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas 1352 return self._execute_and_fetch(req) 1353 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch 1354 for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()): 1355 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__ 1356 return self._next() 1357 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next 1358 raise self 1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: 1360 status = StatusCode.UNKNOWN 1361 details = "[CAST_INVALID_INPUT] The value ' ab ' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error." 1362 debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \' ab \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}" 1363> 1364 1365---------------------------------------------------------------------- 1366Ran 14 tests in 40.832s {code} This Jira aims to disable the tests for now to make sure the test coverage in other tests. PySpark tests fails in the middle if one fails. > Disable ANSI mode in pyspark.sql.tests.connect.test_connect_functions > --------------------------------------------------------------------- > > Key: SPARK-41548 > URL: https://issues.apache.org/jira/browse/SPARK-41548 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests > Affects Versions: 3.4.0 > Reporter: Hyukjin Kwon > Priority: Major > > There are failures in `test_connect_functions` with ANSI mode on > (https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223). I > tried to fix but they are tricky to fix because Spark Connect does not > respect the runtime configuration at the server side. > It is also tricky to fix the test to pass in both ANSI mode on and off. > Therefore, it disables temporarily to make other tests pass. Note that > PySpark tests stop in the middle if one fails. > {code:java} > ====================================================================== > 1322ERROR [0.264s]: test_date_ts_functions > (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) > 1323---------------------------------------------------------------------- > 1324Traceback (most recent call last): > 1325 File > "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", > line 1149, in test_date_ts_functions > 1326 cdf.select(cfunc(cdf.ts1)).toPandas(), > 1327 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line > 1533, in toPandas > 1328 return self._session.client._to_pandas(query) > 1329 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, > in _to_pandas > 1330 return self._execute_and_fetch(req) > 1331 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, > in _execute_and_fetch > 1332 for b in self._stub.ExecutePlan(req, > metadata=self._builder.metadata()): > 1333 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line > 426, in __next__ > 1334 return self._next() > 1335 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line > 826, in _next > 1336 raise self > 1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC > that terminated with: > 1338 status = StatusCode.UNKNOWN > 1339 details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the > type "STRING" cannot be cast to "DATE" because it is malformed. Correct the > value as per the syntax, or change its target type. Use `try_cast` to > tolerate malformed input and return NULL instead. If necessary set > "spark.sql.ansi.enabled" to "false" to bypass this error." > 1340 debug_error_string = "UNKNOWN:Error received from peer > ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value > \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" > because it is malformed. Correct the value as per the syntax, or change its > target type. Use `try_cast` to tolerate malformed input and return NULL > instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass > this error.", grpc_status:2, > created_time:"2022-12-16T01:49:15.71844837+00:00"}" > 1341> > 1342 > 1343====================================================================== > 1344ERROR [0.527s]: test_string_functions_one_arg > (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) > 1345---------------------------------------------------------------------- > 1346Traceback (most recent call last): > 1347 File > "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", > line 985, in test_string_functions_one_arg > 1348 cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(), > 1349 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line > 1533, in toPandas > 1350 return self._session.client._to_pandas(query) > 1351 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, > in _to_pandas > 1352 return self._execute_and_fetch(req) > 1353 File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, > in _execute_and_fetch > 1354 for b in self._stub.ExecutePlan(req, > metadata=self._builder.metadata()): > 1355 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line > 426, in __next__ > 1356 return self._next() > 1357 File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line > 826, in _next > 1358 raise self > 1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC > that terminated with: > 1360 status = StatusCode.UNKNOWN > 1361 details = "[CAST_INVALID_INPUT] The value ' ab ' of the type > "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the > value as per the syntax, or change its target type. Use `try_cast` to > tolerate malformed input and return NULL instead. If necessary set > "spark.sql.ansi.enabled" to "false" to bypass this error." > 1362 debug_error_string = "UNKNOWN:Error received from peer > ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \' ab > \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is > malformed. Correct the value as per the syntax, or change its target type. > Use `try_cast` to tolerate malformed input and return NULL instead. If > necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", > grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}" > 1363> > 1364 > 1365---------------------------------------------------------------------- > 1366Ran 14 tests in 40.832s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org