[GitHub] [sedona] umartin opened a new issue, #1040: Regression in Sedona 1.4.1 leading to OutOfMemoryException

via GitHub Thu, 28 Sep 2023 06:04:50 -0700


umartin opened a new issue, #1040:
URL: https://github.com/apache/sedona/issues/1040


   I recently upgraded our pipelines from Sedona 1.4.0 to Sedona 1.4.1 and one 
of them is failing with OutOfMemoryException. It fails even if I double the 
memory budget. I have tracked the regression down to this commit which added 
metrics to the NestedLoopJudgement. 
https://github.com/apache/sedona/pull/851/files#diff-2eb9b76007b1dc6ede8341c8e0864b3b7a7d0e2d4e7203220ec1e74c590124f9
   
   ## Expected behavior
   
   No drastic increase in memory use.
   
   ## Actual behavior
   
   OOM
   
   ## Steps to reproduce the problem
   
   Run a job with no index and a large number of partitions. Sedona 1.4.1 need 
significantly more memory.
   
   ## Settings
   
   sedona.global.index=false
   sedona.join.numpartition=20000
   
   Sedona version = 1.4.1
   
   Apache Spark version = 3.4
   
   API type = Java and Python tested
   
   Scala version = 2.12
   
   JRE version = 17
   
   Python version = 3.11
   
   Environment = Standalone


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [sedona] umartin opened a new issue, #1040: Regression in Sedona 1.4.1 leading to OutOfMemoryException

Reply via email to