[ https://issues.apache.org/jira/browse/HUDI-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lamber-ken updated HUDI-388: ---------------------------- Description: Currently, hudi offers some tools available to operate an ecosystem of Hudi datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to manage the hudi datasets by some customized ddl sql statements instead of via hudi-cli. After SPARK-18127, we can customize the spark session with our own optimizer, parser, analyzer, and physical plan strategy rules in Spark. Here are some steps to extend spark session 1, Need a tool to parse the SparkSQL statements, like antlr 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and inject the parser 3, Run the customized statements by extending org.apache.spark.sql.execution.command.RunnableCommand Demo 1, Extend SparkSessionExtensions {code:java} class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectParser { (session, parser) => new HudiDDLParser(parser) } } } {code} 2, Extend RunnableCommand {code:java} case class HudiStatCommand(path: String) extends RunnableCommand { override val output: Seq[Attribute] = { Seq( AttributeReference("Results", IntegerType, nullable = false)(), AttributeReference("Results2", IntegerType, nullable = false)() ) } override def run(sparkSession: SparkSession): Seq[Row] = { Seq(Row(22, 456)) } } {code} [http://hudi.apache.org/admin_guide.html] https://issues.apache.org/jira/browse/SPARK-18127 was: Currently, hudi offers some tools available to operate an ecosystem of Hudi datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to manage the hudi datasets by some customized ddl sql statements instead of via hudi-cli. After SPARK-18127, we can customize the spark session with our own optimizer, parser, analyzer, and physical plan strategy rules in Spark. [http://hudi.apache.org/admin_guide.html] > Support DDL / DML SparkSQL statements > ------------------------------------- > > Key: HUDI-388 > URL: https://issues.apache.org/jira/browse/HUDI-388 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Reporter: lamber-ken > Priority: Major > > Currently, hudi offers some tools available to operate an ecosystem of Hudi > datasets including hudi-cli, metrics, spark ui[1]. > It's easy for admins to manage the hudi datasets by some customized ddl sql > statements instead of via hudi-cli. > > After SPARK-18127, we can customize the spark session with our own optimizer, > parser, analyzer, and physical plan strategy rules in Spark. Here are some > steps to extend spark session > 1, Need a tool to parse the SparkSQL statements, like antlr > 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and > inject the parser > 3, Run the customized statements by extending > org.apache.spark.sql.execution.command.RunnableCommand > > Demo > 1, Extend SparkSessionExtensions > {code:java} > class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) { > override def apply(extensions: SparkSessionExtensions): Unit = { > extensions.injectParser { (session, parser) => > new HudiDDLParser(parser) > } > } > } {code} > > 2, Extend RunnableCommand > {code:java} > case class HudiStatCommand(path: String) extends RunnableCommand { > override val output: Seq[Attribute] = { > Seq( > AttributeReference("Results", IntegerType, nullable = false)(), > AttributeReference("Results2", IntegerType, nullable = false)() > ) > } > override def run(sparkSession: SparkSession): Seq[Row] = { > Seq(Row(22, 456)) > } > } > {code} > [http://hudi.apache.org/admin_guide.html] > https://issues.apache.org/jira/browse/SPARK-18127 > -- This message was sent by Atlassian Jira (v8.3.4#803005)