[
https://issues.apache.org/jira/browse/GOBBLIN-1602?focusedWorklogId=722199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722199
]
ASF GitHub Bot logged work on GOBBLIN-1602:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Feb/22 19:15
Start Date: 07/Feb/22 19:15
Worklog Time Spent: 10m
Work Description: ZihanLi58 commented on a change in pull request #3459:
URL: https://github.com/apache/gobblin/pull/3459#discussion_r800968438
##########
File path: .github/workflows/build_and_test.yaml
##########
@@ -124,6 +124,7 @@ jobs:
echo -e "$(ip addr show eth0 | grep "inet\b" | awk '{print $2}' |
cut -d/ -f1)\t$(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
- name: Verify mysql connection
run: |
+ sudo apt-get --fix-missing update
Review comment:
This one is belonging to another PR?
##########
File path:
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/HiveUtils.java
##########
@@ -167,4 +169,19 @@ private static Configuration getHadoopConfiguration() {
public static boolean isPartitioned(Table table) {
return table.isPartitioned();
}
+
+ /**
+ * @param fs User configured filesystem of the target table
+ * @param userSpecifiedPath user specified path of the copy table location
or partition
+ * @param existingTablePath path of an already registered Hive table or
partition
+ * @return true if the filesystem resolves them to be equivalent, false
otherwise
+ */
+ public static boolean areTablePathsEquivalent(FileSystem fs, Path
userSpecifiedPath, Path existingTablePath) throws IOException {
+ try {
+ return
fs.resolvePath(existingTablePath).equals(fs.resolvePath(userSpecifiedPath));
Review comment:
So after this change, we require the fs to be virtual fileSystem?
##########
File path:
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/HivePartitionFileSet.java
##########
@@ -99,7 +99,7 @@ public HivePartitionFileSet(HiveCopyEntityHelper
hiveCopyEntityHelper, Partition
hiveCopyEntityHelper.getExistingEntityPolicy() !=
HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE_AND_PARTITIONS) {
log.error("Source and target partitions are not compatible.
Aborting copy of partition " + this.partition,
ioe);
- return Lists.newArrayList();
+ throw ioe;
Review comment:
Just wondering if we trace up the code, is there a way for us to control
whether this exception will fail the whole job or not? Should we just collect
the exception without failing the job and at the end of the job, using this
info to determine whether we want to fail the job or not? (This can be in
another PR though)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 722199)
Time Spent: 1h 40m (was: 1.5h)
> Handle hive table mismatch when paths are equivalent in the underlying FS
> -------------------------------------------------------------------------
>
> Key: GOBBLIN-1602
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1602
> Project: Apache Gobblin
> Issue Type: Task
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> In scenarios where the paths are equivalent in the underlying FS, hive copy
> should not treat these paths separately if the user provided URI does not
> match the hive registered URI
--
This message was sent by Atlassian Jira
(v8.20.1#820001)