pvary commented on a change in pull request #2921:
URL: https://github.com/apache/hive/pull/2921#discussion_r793293292
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -97,6 +101,40 @@ public MoveTask() {
super();
}
+ public void flattenUnionSubdirectories(Path sourcePath) throws HiveException
{
+ try {
+ FileSystem fs = sourcePath.getFileSystem(conf);
+ LOG.info("Checking " + sourcePath + " for subdirectories to flatten");
+ Set<Path> unionSubdirs = new HashSet<>();
+ if (fs.exists(sourcePath)) {
+ RemoteIterator<LocatedFileStatus> i = fs.listFiles(sourcePath, true);
+ String prefix = AbstractFileMergeOperator.UNION_SUDBIR_PREFIX;
+ while (i.hasNext()) {
+ Path path = i.next().getPath();
+ Path parent = path.getParent();
+ if (parent.getName().startsWith(prefix)) {
+ // We do rename by including the name of parent directory into the
filename so that there are no clashes
+ // when we move the files to the parent directory. Ex.
HIVE_UNION_SUBDIR_1/000000_0 -> 1_000000_0
+ String parentOfParent = parent.getParent().toString();
+ String parentNameSuffix =
parent.getName().substring(prefix.length());
+
+ fs.rename(path, new Path(parentOfParent + "/" + parentNameSuffix +
"_" + path.getName()));
Review comment:
What happens if we already has this filename used?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]