[jira] [Created] (SPARK-2094) Ensure exactly once semantics for DDL / Commands

Michael Armbrust (JIRA) Tue, 10 Jun 2014 01:35:24 -0700

Michael Armbrust created SPARK-2094:
---------------------------------------


             Summary: Ensure exactly once semantics for DDL / Commands
                 Key: SPARK-2094
                 URL: https://issues.apache.org/jira/browse/SPARK-2094
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Michael Armbrust
             Fix For: 1.1.0


>From [~lian cheng]...
The constraints presented here are:

 * The side effect of a command SchemaRDD should take place eagerly;
 * The side effect of a command SchemaRDD should take place once and only once;
 * When .collect() method is called, something meaningful, usually the output 
message lines of the command, should be presented.

Then how about adding a lazy field inside all the physical command nodes to 
wrap up the side effect and hold the command output? Take the 
SetCommandPhysical as an example:
{code}
trait PhysicalCommand(@transient context: SQLContext) {
   lazy val commandOutput: Any
}

case class SetCommandPhysical(
    key: Option[String], value: Option[String], output: Seq[Attribute])(
    @transient context: SQLContext)
  extends PhysicalCommand(context)
  with PhysicalCommand {

  override lazy val commandOutput = {
    // Perform the side effect, and record appropriate output
    ???
  }

  def execute(): RDD[Row] = {
    val row = new GenericRow(Array[Any](commandOutput))
    context.sparkContext.parallelize(row, 1)
  }
}
{code}
In this way, all the constraints are met:

 * Eager evaluation: done by the toRdd call in SchemaRDDLike (PR #948),
 * Side effect should take place once and only once: ensured by the lazy 
commandOutput field,
 * Present meaningful output as RDD contents: command output is held by 
commandOutput and returned in execute().

An additional benefit is that, side effect logic of all the commands can be 
implemented within their own physical command nodes, instead of adding special 
cases inside SQLContext.toRdd and/or HiveContext.toRdd.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (SPARK-2094) Ensure exactly once semantics for DDL / Commands

Reply via email to