Also we checked that we have already backported
https://issues.apache.org/jira/browse/SPARK-33557 jira.

On Mon, Dec 20, 2021 at 11:08 AM Senthil Kumar <sen...@gmail.com> wrote:

> @abhishek. We use spark 3.1*
>
> On Mon, 20 Dec 2021, 09:50 Rao, Abhishek (Nokia - IN/Bangalore), <
> abhishek....@nokia.com> wrote:
>
>> Hi Senthil,
>>
>>
>>
>> Which version of Spark 3 are we using? We had this kind of observation
>> with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured
>> big value for spark.network.timeout and this value was not taking effect
>> in all releases prior to 3.0.2.
>>
>> This was fixed as part of
>> https://issues.apache.org/jira/browse/SPARK-33557. Because we had
>> configured big value for spark.network.timeout, this was resulting in TPCDS
>> queries taking long time when tried with Spark 3.0.2 and 3.1.x. Once we
>> corrected it, we observed that the queries were executed much faster.
>>
>>
>>
>> Thanks and Regards,
>>
>> Abhishek
>>
>>
>>
>> *From:* Senthil Kumar <sen...@gmail.com>
>> *Sent:* Sunday, December 19, 2021 11:58 PM
>> *To:* dev <dev@spark.apache.org>
>> *Subject:* Spark 3 is Slower than Spark 2 for TPCDS Q04 query.
>>
>>
>>
>> Hi All,
>>
>> We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3
>> additional features) with TPCDS queries and found that Spark 3's
>> performance is reduced to at-least 30-40% compared to Spark 2.4.5.
>>
>>
>>
>> Eg.
>>
>> Data size used 1TB
>>
>>
>> Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5
>> min.
>>
>>
>>
>> Note: We tested this in the same cluster with the same size of data. And
>> we ensured that parameters we passed are one and the same for SPark 2.4*
>> and Spark 3*.
>>
>>
>>
>> It will be helpful, if any one you also encountered the same issue in
>> your benchmarking activities? If so, pls share your input on what could be
>> the reason behind this poor performance.
>>
>>
>>
>> --
>>
>> Senthil kumar
>>
>

-- 
Senthil kumar

Reply via email to