Possible bug in DatasourceV2

assaf.mendelson Wed, 10 Oct 2018 23:48:59 -0700

Hi,

I created a datasource writer WITHOUT a reader. When I do, I get an
exception: org.apache.spark.sql.AnalysisException: Data source is not
readable: DefaultSource


The reason for this is that when save is called, inside the source match to
WriterSupport we have the following code:

val source = cls.newInstance().asInstanceOf[DataSourceV2]
      source match {
        case ws: WriteSupport =>
          val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
            source,
            df.sparkSession.sessionState.conf)
          val options = sessionOptions ++ extraOptions
-->      val relation = DataSourceV2Relation.create(source, options)

          if (mode == SaveMode.Append) {
            runCommand(df.sparkSession, "save") {
              AppendData.byName(relation, df.logicalPlan)
            }

          } else {
            val writer = ws.createWriter(
              UUID.randomUUID.toString, df.logicalPlan.output.toStructType,
mode,
              new DataSourceOptions(options.asJava))

            if (writer.isPresent) {
              runCommand(df.sparkSession, "save") {
                WriteToDataSourceV2(writer.get, df.logicalPlan)
              }
            }
          }

but DataSourceV2Relation.create actively creates a reader
(source.createReader) to extract the schema: 

def create(
      source: DataSourceV2,
      options: Map[String, String],
      tableIdent: Option[TableIdentifier] = None,
      userSpecifiedSchema: Option[StructType] = None): DataSourceV2Relation
= {
    val reader = source.createReader(options, userSpecifiedSchema)
    val ident = tableIdent.orElse(tableFromOptions(options))
    DataSourceV2Relation(
      source, reader.readSchema().toAttributes, options, ident,
userSpecifiedSchema)
  }


This makes me a little confused.

First, the schema is defined by the dataframe itself, not by the data
source, i.e. it should be extracted from df.schema and not by
source.createReader

Second, I see that relation is actually only use if the mode is
SaveMode.append (btw this means if it is needed it should be defined inside
the "if"). I am not sure I understand the portion of the AppendData but why
would reading from the source be included? 

Am I missing something here?

Thanks,
   Assaf



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Possible bug in DatasourceV2

Reply via email to