[jira] [Comment Edited] (HIVE-27970) Single Hive table partitioning to multiple storage system- (e.g, S3 and HDFS)

Butao Zhang (Jira) Wed, 10 Jan 2024 01:45:53 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-27970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805034#comment-17805034
 ]


Butao Zhang edited comment on HIVE-27970 at 1/10/24 9:44 AM:
-------------------------------------------------------------

[~zhixingheyi-tian] Hi, i just tested the master banch(Hive 4.0), and found 
that it worked as expected. That is to say, Hive4 partition can have different 
path schema, and partition location on s3 won't change back table's hdfs path 
after insert data.

But  you reported this issue from Hive3, so i think some code change has 
happened after Hive3. And i found this code logic was refactored & optimized a 
lot since Hive4, so i can not tell you which commit made this behavior.

 

Attach my test log . Test Env:  Hive master branch & Hadoop3.3.1 & Tez0.10.2

[^hive4_test_partition_on_s3.txt]

Please check my test if it is correct. Thanks.


was (Author: zhangbutao):
[~zhixingheyi-tian] Hi, i just tested the master banch(Hive 4.0), and found 
that it worked as expected. That is to say, Hive4 partition can have different 
path schema, and partition location on s3 won't change back table's hdfs path 
after insert data.

But  you reported this issue from Hive3, so i think some code change has 
happened after Hive3. And i found this code logic was refactored & optimized a 
lot since Hive4, so i can not tell you which commit made this behavior.

 

Attach my test log . Test Env:  Hive master branch & Hadoop3.3.1 & Tez0.10.2

[^hive4_test_partition_on_s3.txt]

Please check my test if. Thanks.

> Single Hive table partitioning to multiple storage system- (e.g, S3 and HDFS)
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-27970
>                 URL: https://issues.apache.org/jira/browse/HIVE-27970
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.1.2
>            Reporter: zhixingheyi-tian
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: hive4_test_partition_on_s3.txt
>
>
> Single Hive/Datasource table partitioning to multiple storage system- (e.g, 
> S3 and HDFS)
> For Hive table:
>  
> {code:java}
> CREATE  TABLE htable a string, b string)  PARTITIONED BY ( p string ) 
> location "hdfs://{cluster}}/user/hadoop/htable/";
> alter table htable  add partition(p='p1')  location 
> 's3a://{bucketname}/usr/hive/warehouse/htable/p=p1';
> {code}
>  
> When inserting into htable,  or insert overwrite htable.  New data of “p=p1” 
> will insert table location storage. This does not meet the requirements.
> Is there any best practise?  Or is there a plan to support this feature?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27970) Single Hive table partitioning to multiple storage system- (e.g, S3 and HDFS)

Reply via email to