[ https://issues.apache.org/jira/browse/SPARK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro resolved SPARK-35192. -------------------------------------- Fix Version/s: 3.2.0 Assignee: Takeshi Yamamuro Resolution: Fixed Resolved by https://github.com/apache/spark/pull/32243 > Port minimal TPC-DS datagen code from databricks/spark-sql-perf > --------------------------------------------------------------- > > Key: SPARK-35192 > URL: https://issues.apache.org/jira/browse/SPARK-35192 > Project: Spark > Issue Type: Test > Components: SQL, Tests > Affects Versions: 3.2.0 > Reporter: Takeshi Yamamuro > Assignee: Takeshi Yamamuro > Priority: Minor > Fix For: 3.2.0 > > > This PR aims at porting minimal code to generate TPC-DS data from > databricks/spark-sql-perf. The classes in a new class file tpcdsDatagen.scala > are basically copied from the databricks/spark-sql-perf codebase. > We frequently use TPCDS data now for benchmarks/tests, but the classes for > the TPCDS schemas of datagen and benchmarks/tests are managed separately, > e.g., > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala > https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/tpcds/TPCDSTables.scala > I think this causes some inconveniences, e.g., we need to update both files > in the separate repositories if we update the TPCDS schema #32037. So, it > would be useful for the Spark codebase to generate them by referring to the > same schema definition. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org