[jira] [Commented] (SPARK-6495) DataFrame#insertInto method should support insert rows with sub-columns

Cheng Lian (JIRA) Tue, 24 Mar 2015 08:58:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378051#comment-14378051
 ]


Cheng Lian commented on SPARK-6495:
-----------------------------------

Inserting a subset of columns in the original schema only makes sense for those 
data sources which support schema evolution, and thus can reconcile different 
but compatible schemas. For other data sources, e.g. CSV, this behavior only 
generates malformed data.

Since 1.3.0, Spark SQL Parquet data source does support schema evolution, so 
you can do this in Spark 1.3 with Parquet.

> DataFrame#insertInto method should support insert rows with sub-columns
> -----------------------------------------------------------------------
>
>                 Key: SPARK-6495
>                 URL: https://issues.apache.org/jira/browse/SPARK-6495
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Chaozhong Yang
>
> The original table's schema is like this:
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: string (nullable = true)
> If we want to insert one row(can be transformed into DataFrame) with this 
> schema:
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
> Of course, that operation will fail. Actually, in many cases, people need to 
> insert new rows with columns which is the subset of original table columns. 
> If we can support and fix those issue, Spark SQL's insertion can be more 
> valuable to users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-6495) DataFrame#insertInto method should support insert rows with sub-columns

Reply via email to