[
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521852#comment-17521852
]
Danny Guinther commented on SPARK-38792:
----------------------------------------
I'm getting the impression that the problem may be with some code that
Databricks bolts on to Spark. I'd say ignore this ticket unless you hear
otherwise.
> Regression in time executor takes to do work sometime after v3.0.1 ?
> --------------------------------------------------------------------
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.2.1
> Reporter: Danny Guinther
> Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png,
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg,
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg,
> what-is-this-code.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I
> don't believe it is specific to my application since the upgrade to 3.0.1 to
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my
> application due to the high volume of work my application does, but I could
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test
> versions between 3.0.1 and 3.2.1 because my application was previously
> blocked from upgrading beyond Spark 3.0.1 by
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint
> the problem? I've tried a bunch of the suggestions from
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those
> help, but none of the adjustments I've tried have been fruitful. I also tried
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html]
> for ideas as to what might have changed to cause this behavior, but haven't
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by
> executor actions. In the image the blue and purple lines are different kinds
> of reads using the built-in JDBC data reader and the green line is writes
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1
> occurred at 9AM on the graph. The graph data comes from timing blocks that
> surround only the calls to dataframe actions, so there shouldn't be anything
> specific to my application that is suddenly inflating these numbers. The
> specific actions I'm invoking are: count() (but there's some transforming and
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark
> 3.0.1, but I don't think that explains this behavior. The executors don't
> seem to have any problem with memory or GC and are not overutilized (our
> pipeline is very read and write heavy, less heavy on transformations, so
> executors tend to be idle while waiting for various network I/O).
>
> Thanks in advance for any help!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]