Kristin Cowalcijk created SEDONA-706:
----------------------------------------

             Summary: Python DataFrame API have problem working in 
multi-threaded environment
                 Key: SEDONA-706
                 URL: https://issues.apache.org/jira/browse/SEDONA-706
             Project: Apache Sedona
          Issue Type: Bug
            Reporter: Kristin Cowalcijk
             Fix For: 1.7.1


This issue is reported by 
[https://github.com/apache/sedona/issues/1771|https://github.com/apache/sedona/issues/1771].
 The user wanted to call ST functions using DataFrame API but an exception was 
raised.

Further investigation showed that DataFrame API relies on 
{{SparkSession.getActiveSession}} to construct Spark SQL UDF calls. The "active 
session" is thread local and {{SparkSession.getActiveSession}} will only return 
a valid session in the thread that starts the Spark session. I believe that the 
Python backend is handling requests in a different thread so that thread has no 
active session.

What we need for calling sedona function is a JVMView object. We can obtain 
this object from {{SparkContext._jvm}} instead of {{spark._jvm}}. This won't 
use any thread local states and will work correctly when there's an active 
Spark context in the current process.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to