[
https://issues.apache.org/jira/browse/HIVE-27970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907821#comment-17907821
]
Ayush Saxena commented on HIVE-27970:
-------------------------------------
Regarding failure with hive.metastore.dml.events=false
I think the problem lies here:
[https://github.com/apache/hive/blob/20d26ad269af3c281f845df76d3b8d260cabc904/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3913]
It actually creates a FileSystem using the Table location, & then reuses for
all the paths considering all the file paths provided will be within the same
FileSystem, which ain't the case here:
Something like this should fix I believe
{noformat}
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
index f447aacdf7..59c6286fcd 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
@@ -3844,13 +3844,12 @@ public static void addWriteNotificationLog(HiveConf
conf, Table tbl, List<String
Long txnId, Long writeId,
List<FileStatus> newFiles,
List<WriteNotificationLogRequest>
requestList)
throws IOException, HiveException, TException {
- FileSystem fileSystem = tbl.getDataLocation().getFileSystem(conf);
InsertEventRequestData insertData = new InsertEventRequestData();
insertData.setReplace(true);
WriteNotificationLogRequest rqst = new WriteNotificationLogRequest(txnId,
writeId,
tbl.getDbName(), tbl.getTableName(), insertData);
- addInsertFileInformation(newFiles, fileSystem, insertData);
+ addInsertFileInformation(newFiles, conf, insertData);
rqst.setPartitionVals(partitionVals);
if (requestList == null) {
@@ -3910,13 +3909,12 @@ private void fireInsertEvent(Table tbl, Map<String,
String> partitionSpec, boole
return;
}
try {
- FileSystem fileSystem = tbl.getDataLocation().getFileSystem(conf);
FireEventRequestData data = new FireEventRequestData();
InsertEventRequestData insertData = new InsertEventRequestData();
insertData.setReplace(replace);
data.setInsertData(insertData);
if (newFiles != null && !newFiles.isEmpty()) {
- addInsertFileInformation(newFiles, fileSystem, insertData);
+ addInsertFileInformation(newFiles, conf, insertData);
} else {
insertData.setFilesAdded(new ArrayList<String>());
}
@@ -3938,7 +3936,7 @@ private void fireInsertEvent(Table tbl, Map<String,
String> partitionSpec, boole
}
- private static void addInsertFileInformation(List<FileStatus> newFiles,
FileSystem fileSystem,
+ private static void addInsertFileInformation(List<FileStatus> newFiles,
Configuration conf,
InsertEventRequestData insertData) throws IOException {
LinkedList<Path> directories = null;
for (FileStatus status : newFiles) {
@@ -3949,7 +3947,7 @@ private static void
addInsertFileInformation(List<FileStatus> newFiles, FileSyst
directories.add(status.getPath());
continue;
}
- addInsertNonDirectoryInformation(status.getPath(), fileSystem,
insertData);
+ addInsertNonDirectoryInformation(status.getPath(), conf, insertData);
}
if (directories == null) {
return;
@@ -3958,7 +3956,7 @@ private static void
addInsertFileInformation(List<FileStatus> newFiles, FileSyst
// are some examples where we would have 1, or few, levels respectively.
while (!directories.isEmpty()) {
Path dir = directories.poll();
- FileStatus[] contents = fileSystem.listStatus(dir);
+ FileStatus[] contents = dir.getFileSystem(conf).listStatus(dir);
if (contents == null) {
continue;
}
@@ -3967,15 +3965,16 @@ private static void
addInsertFileInformation(List<FileStatus> newFiles, FileSyst
directories.add(status.getPath());
continue;
}
- addInsertNonDirectoryInformation(status.getPath(), fileSystem,
insertData);
+ addInsertNonDirectoryInformation(status.getPath(), conf, insertData);
}
}
}
- private static void addInsertNonDirectoryInformation(Path p, FileSystem
fileSystem,
+ private static void addInsertNonDirectoryInformation(Path p, Configuration
conf,
InsertEventRequestData insertData) throws IOException {
insertData.addToFilesAdded(p.toString());
+ FileSystem fileSystem = p.getFileSystem(conf);
FileChecksum cksum = fileSystem.getFileChecksum(p);
String acidDirPath = AcidUtils.getFirstLevelAcidDirPath(p.getParent(),
fileSystem);
// File checksum is not implemented for local filesystem
(RawLocalFileSystem)
{noformat}
Though I haven't tried yet for the use case mentioned.
{quote}Maybe not. I am not sure if viewfs can be used for s3a protocal.
{quote}
ViewFs works for every FileSystem, though we haven't tested the latest Hive
against it too much, the path resolution to figure out wether to copy or do a
rename might be screwed at some place or the other because hive usually just
matches the schema to figure out if the FS are same or not & whether to copy or
rename, for ViewFs we need to resolve the link & then compare the schema, if I
remember it right, there are places where the logics aren't in place for ViewFs
> Single Hive table partitioning to multiple storage system- (e.g, S3 and HDFS)
> -----------------------------------------------------------------------------
>
> Key: HIVE-27970
> URL: https://issues.apache.org/jira/browse/HIVE-27970
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 3.1.2
> Reporter: zhixingheyi-tian
> Priority: Major
> Labels: pull-request-available
> Attachments: hive4_test_partition_on_s3.txt
>
>
> Single Hive/Datasource table partitioning to multiple storage system- (e.g,
> S3 and HDFS)
> For Hive table:
>
> {code:java}
> CREATE TABLE htable a string, b string) PARTITIONED BY ( p string )
> location "hdfs://{cluster}}/user/hadoop/htable/";
> alter table htable add partition(p='p1') location
> 's3a://{bucketname}/usr/hive/warehouse/htable/p=p1';
> {code}
>
> When inserting into htable, or insert overwrite htable. New data of “p=p1”
> will insert table location storage. This does not meet the requirements.
> Is there any best practise? Or is there a plan to support this feature?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)