[
https://issues.apache.org/jira/browse/HIVE-27970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805113#comment-17805113
]
Butao Zhang edited comment on HIVE-27970 at 1/10/24 12:43 PM:
--------------------------------------------------------------
1) what 's meaning of "insert into different locations"? In my test, the
partition location is on s3, and table location is on hdfs.
2) "insert overwrite" can run successfully as normal.
3) *set hive.metastore.dml.events=true* will fail with the following error. I
will try to figure out & resolve this error.
{code:java}
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException: Wrong FS:
s3a://tpcds-csy/tests3/testhivetbl/dt=2024/000000_0_copy_3, expected:
hdfs://127.0.0.1:8028
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[?:1.8.0_291]
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[?:1.8.0_291]
at
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:3297)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:697)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:545)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
... 11 more
Caused by: java.lang.IllegalArgumentException: Wrong FS:
s3a://tpcds-csy/tests3/testhivetbl/dt=2024/000000_0_copy_3, expected:
hdfs://127.0.0.1:8028
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:807)
~[hadoop-common-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:253)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1880)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1876)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
~[hadoop-common-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1888)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3960)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3933)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3900)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2783)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3246)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta {code}
was (Author: zhangbutao):
1) what 's meaning of "insert into different locations"? In my test, the
partition location is on s3, and table location is on hdfs.
2) "insert overwrite" can run successfully as normal.
3) *set hive.metastore.dml.events=true* will fail with the following error. I
will try to figure out & resolve this error.
{code:java}
2024-01-10T20:29:07,231 ERROR [load-dynamic-partitionsToAdd-0] metadata.Hive:
Exception when loading partition with parameters
partPath=hdfs://127.0.0.1:8028/user/hive/warehouse/hiveicetest/testdb.db/testhivetbl/.hive-staging_hive_2024-01-10_20-29-00_928_7189786252210098785-15/-ext-10000/dt=2024,
table=testhivetbl, partSpec={dt=2024}, loadFileType=KEEP_EXISTING,
listBucketingLevel=0, isAcid=false, resetStatistics=false
java.lang.IllegalArgumentException: Wrong FS:
s3a://tpcds-csy/tests3/testhivetbl/dt=2024/000000_0_copy_3, expected:
hdfs://100.71.36.43:8028
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:807)
~[hadoop-common-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:253)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1880)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1876)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
~[hadoop-common-3.3.1.jar:?]
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1888)
~[hadoop-hdfs-client-3.3.1.jar:?]
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3960)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3933)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3900)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2783)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3246)
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_291]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_291]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_291]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
{code}
> Single Hive table partitioning to multiple storage system- (e.g, S3 and HDFS)
> -----------------------------------------------------------------------------
>
> Key: HIVE-27970
> URL: https://issues.apache.org/jira/browse/HIVE-27970
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 3.1.2
> Reporter: zhixingheyi-tian
> Priority: Major
> Labels: pull-request-available
> Attachments: hive4_test_partition_on_s3.txt
>
>
> Single Hive/Datasource table partitioning to multiple storage system- (e.g,
> S3 and HDFS)
> For Hive table:
>
> {code:java}
> CREATE TABLE htable a string, b string) PARTITIONED BY ( p string )
> location "hdfs://{cluster}}/user/hadoop/htable/";
> alter table htable add partition(p='p1') location
> 's3a://{bucketname}/usr/hive/warehouse/htable/p=p1';
> {code}
>
> When inserting into htable, or insert overwrite htable. New data of “p=p1”
> will insert table location storage. This does not meet the requirements.
> Is there any best practise? Or is there a plan to support this feature?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)