[ 
https://issues.apache.org/jira/browse/HIVE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated HIVE-25343:
------------------------------
    Description: 
In many cases, users use Spark and Hive together. When a user creates a view 
via Spark, the table output columns will store in table properties, such as 
 !Screen Shot 2021-07-19 at 15.36.29.png|width=80%!

After that, if the user runs the command "create or replace view" via Hive, to 
change the schema. The old table properties added by Spark are not cleaned by 
Hive. Then users read the table via Spark. The schema didn't change. It very 
confused users.

How to reproduce:
{code}
spark-sql>create table lajin_table (a int, b int) stored as parquet;
spark-sql>create view lajin_view as select * from lajin_table;
spark-sql> desc lajin_view;
a       int     NULL    NULL
b       int     NULL    NULL

hive>desc lajin_view;
a                       int                                         
b                       int
hive>create or replace view lajin_view as select a, b, 3 as c from lajin_table;
hive>desc lajin_view;
a                       int                                         
b                       int                                         
c                       int

spark-sql> desc lajin_view; -- not changed
a       int     NULL    NULL
b       int     NULL    NULL
{code}

  was:
In many cases, users use Spark and Hive together. When a user creates a view 
via Spark, the table output columns will store in table properties, such as 
 !Screen Shot 2021-07-19 at 15.36.29.png|width=80!

After that, if the user runs the command "create or replace view" via Hive, to 
change the schema. The old table properties added by Spark are not cleaned by 
Hive. Then users read the table via Spark. The schema didn't change. It very 
confused users.


> Create or replace view should clean the old table properties
> ------------------------------------------------------------
>
>                 Key: HIVE-25343
>                 URL: https://issues.apache.org/jira/browse/HIVE-25343
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.3, 3.2.0
>            Reporter: Lantao Jin
>            Assignee: Lantao Jin
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screen Shot 2021-07-19 at 15.36.29.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many cases, users use Spark and Hive together. When a user creates a view 
> via Spark, the table output columns will store in table properties, such as 
>  !Screen Shot 2021-07-19 at 15.36.29.png|width=80%!
> After that, if the user runs the command "create or replace view" via Hive, 
> to change the schema. The old table properties added by Spark are not cleaned 
> by Hive. Then users read the table via Spark. The schema didn't change. It 
> very confused users.
> How to reproduce:
> {code}
> spark-sql>create table lajin_table (a int, b int) stored as parquet;
> spark-sql>create view lajin_view as select * from lajin_table;
> spark-sql> desc lajin_view;
> a       int     NULL    NULL
> b       int     NULL    NULL
> hive>desc lajin_view;
> a                       int                                         
> b                       int
> hive>create or replace view lajin_view as select a, b, 3 as c from 
> lajin_table;
> hive>desc lajin_view;
> a                       int                                         
> b                       int                                         
> c                       int
> spark-sql> desc lajin_view; -- not changed
> a       int     NULL    NULL
> b       int     NULL    NULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to