[GitHub] [sedona] umartin commented on issue #1040: Regression in Sedona 1.4.1 leading to OutOfMemoryException

via GitHub Thu, 28 Sep 2023 06:14:50 -0700


umartin commented on issue #1040:
URL: https://github.com/apache/sedona/issues/1040#issuecomment-1739141695


   With a custom build of v 1.4.1 where the metrics are removed or replaced by 
a LongAccumulator there is no regression regarding memory use.
   
   I think the custom metric class in Sedona is build on a misconception. Spark 
already tracks accumulators per task. There is no need for a map accumulator. 
The Sedona Metrics class seems to have a large memory overhead, especially when 
there are a large number of tasks.
   
   Current metrics in Spark UI (leading to OOM for many tasks):
   Accumulator summary:
   ![Screenshot from 2023-09-28 
12-06-02](https://github.com/apache/sedona/assets/1275096/f3df7d6e-4865-4d5a-a1eb-0613a981a5a1)
   Tasks details:
   
![image](https://github.com/apache/sedona/assets/1275096/c5f9b3f6-eb68-45b0-a15f-0da979c52e07)
   
   With LongAccumulator (no memory overhead):
   
![image](https://github.com/apache/sedona/assets/1275096/fb525923-a36a-4d5a-a903-edc97cc12da1)
   Task details:
   
![image](https://github.com/apache/sedona/assets/1275096/dcdbf13f-eda3-4ca7-be15-4dbe90f7f19b)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] umartin commented on issue #1040: Regression in Sedona 1.4.1 leading to OutOfMemoryException

Reply via email to