Also we checked that we have already backported https://issues.apache.org/jira/browse/SPARK-33557 jira.
On Mon, Dec 20, 2021 at 11:08 AM Senthil Kumar <sen...@gmail.com> wrote: > @abhishek. We use spark 3.1* > > On Mon, 20 Dec 2021, 09:50 Rao, Abhishek (Nokia - IN/Bangalore), < > abhishek....@nokia.com> wrote: > >> Hi Senthil, >> >> >> >> Which version of Spark 3 are we using? We had this kind of observation >> with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured >> big value for spark.network.timeout and this value was not taking effect >> in all releases prior to 3.0.2. >> >> This was fixed as part of >> https://issues.apache.org/jira/browse/SPARK-33557. Because we had >> configured big value for spark.network.timeout, this was resulting in TPCDS >> queries taking long time when tried with Spark 3.0.2 and 3.1.x. Once we >> corrected it, we observed that the queries were executed much faster. >> >> >> >> Thanks and Regards, >> >> Abhishek >> >> >> >> *From:* Senthil Kumar <sen...@gmail.com> >> *Sent:* Sunday, December 19, 2021 11:58 PM >> *To:* dev <dev@spark.apache.org> >> *Subject:* Spark 3 is Slower than Spark 2 for TPCDS Q04 query. >> >> >> >> Hi All, >> >> We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3 >> additional features) with TPCDS queries and found that Spark 3's >> performance is reduced to at-least 30-40% compared to Spark 2.4.5. >> >> >> >> Eg. >> >> Data size used 1TB >> >> >> Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5 >> min. >> >> >> >> Note: We tested this in the same cluster with the same size of data. And >> we ensured that parameters we passed are one and the same for SPark 2.4* >> and Spark 3*. >> >> >> >> It will be helpful, if any one you also encountered the same issue in >> your benchmarking activities? If so, pls share your input on what could be >> the reason behind this poor performance. >> >> >> >> -- >> >> Senthil kumar >> > -- Senthil kumar