[
https://issues.apache.org/jira/browse/SPARK-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378051#comment-14378051
]
Cheng Lian commented on SPARK-6495:
-----------------------------------
Inserting a subset of columns in the original schema only makes sense for those
data sources which support schema evolution, and thus can reconcile different
but compatible schemas. For other data sources, e.g. CSV, this behavior only
generates malformed data.
Since 1.3.0, Spark SQL Parquet data source does support schema evolution, so
you can do this in Spark 1.3 with Parquet.
> DataFrame#insertInto method should support insert rows with sub-columns
> -----------------------------------------------------------------------
>
> Key: SPARK-6495
> URL: https://issues.apache.org/jira/browse/SPARK-6495
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Chaozhong Yang
>
> The original table's schema is like this:
> |-- a: string (nullable = true)
> |-- b: string (nullable = true)
> |-- c: string (nullable = true)
> |-- d: string (nullable = true)
> If we want to insert one row(can be transformed into DataFrame) with this
> schema:
> |-- a: string (nullable = true)
> |-- b: string (nullable = true)
> |-- c: string (nullable = true)
> Of course, that operation will fail. Actually, in many cases, people need to
> insert new rows with columns which is the subset of original table columns.
> If we can support and fix those issue, Spark SQL's insertion can be more
> valuable to users.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]