Re: Spark Union performance issue

Enrico Minack Wed, 22 Feb 2023 12:24:06 -0800

So you union two tables, union the result with another one, and finallywith a last one?


How many columns do all these tables have?


Are you sure creating the plan depends on the number of rows?

Enrico


Am 22.02.23 um 19:08 schrieb Prem Sahoo:

here is the information missed
1. Spark 3.2.0
2. it is scala based
3. size of tables will be ~60G

4. explain plan for catalysts shows lots of time is being spent increating the plan

5. number of union table is 2 , and another 2 then finally 2

slowness is providing resylut as the data size & column size increases .

On Wed, Feb 22, 2023 at 11:07 AM Enrico Minack<[email protected]> wrote:


    Plus number of unioned tables would be helpful, as well as which
    downstream operations are performed on the unioned tables.

    And what "performance issues" do you exactly measure?

    Enrico



    Am 22.02.23 um 16:50 schrieb Mich Talebzadeh:

    Hi,

    Few details will help

     1. Spark version
     2. Spark SQL, Scala or PySpark
     3. size of tables in join.
     4. What does explain() or the joining operation show?


    HTH


    **view my Linkedin profile
    <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


    https://en.everybodywiki.com/Mich_Talebzadeh

    *Disclaimer:* Use it at your own risk.Any and all responsibility
    for any loss, damage or destruction of data or any other property
    which may arise from relying on this email's technical content is
    explicitly disclaimed. The author will in no case be liable for
    any monetary damages arising from such loss, damage or destruction.



    On Wed, 22 Feb 2023 at 15:42, Prem Sahoo <[email protected]>
    wrote:

        Hello Team,
        We are observing Spark Union performance issues when unioning
        big tables with lots of rows. Do we have any option apart
        from the Union ?

Re: Spark Union performance issue

Reply via email to