[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 3/25/22, 10:01 AM:
--------------------------------------------------------

After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

wehn


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

> Connot insert overwite different partition with same table
> ----------------------------------------------------------
>
>                 Key: SPARK-32838
>                 URL: https://issues.apache.org/jira/browse/SPARK-32838
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: hadoop 2.7 + spark 3.0.0
>            Reporter: CHC
>            Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to