[ https://issues.apache.org/jira/browse/SPARK-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
spencerlee updated SPARK-13140: ------------------------------- Remaining Estimate: 10h (was: 168h) Original Estimate: 10h (was: 168h) > spark sql aggregate performance decrease > ------------------------------------------- > > Key: SPARK-13140 > URL: https://issues.apache.org/jira/browse/SPARK-13140 > Project: Spark > Issue Type: Question > Affects Versions: 1.6.0 > Reporter: spencerlee > Original Estimate: 10h > Remaining Estimate: 10h > > In our scenario, their are 30 + key columns with 60+ metric columns. > our typical query is: select key1, key2, key3, key4, key5, sum(metric1), > sum(metric2), sum(metric3).... sum(metric30) from table_name group by key1, > key2, key3, key4, key5. > I import a single parquet file(60M, about 250w+ records) into sparksql , and > do the typical query with local mode. I found that, when I only aggregate 24 > metrics, the response time is about 4.81s, when I aggregate 25+ metrics, the > response time is 45.9s, which is almost 10 times slower. that's obviously > unreasonable. > Is this a bug or need modify some configuration to tune the query? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org