[
https://issues.apache.org/jira/browse/TAJO-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159416#comment-14159416
]
ASF GitHub Bot commented on TAJO-1067:
--------------------------------------
Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/161#discussion_r18433154
--- Diff:
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java ---
@@ -486,6 +532,65 @@ public Path commitOutputData(Query query) {
return finalOutputDir;
}
+ /**
+ * This method sets a a rename map which includes renamed staging
directory to final output directory recursively.
+ * If there exists some data files, this delete it for duplicate data.
+ *
+ *
+ * @param fs
+ * @param stagingPath
+ * @param outputPath
+ * @param stagingParentPathString
+ * @throws IOException
+ */
+ private void visitPartitionedDirectory(FileSystem fs, Path
stagingPath, Path outputPath,
+ String stagingParentPathString,
+ Map<Path, Path> renameDirs, Path
oldTableDir) throws IOException {
+ FileStatus[] files = fs.listStatus(stagingPath);
+
+ for(FileStatus eachFile : files) {
+ if (eachFile.isDirectory()) {
+ Path oldPath = eachFile.getPath();
+
+ // Make recover directory.
+ String recoverPathString =
oldPath.toString().replaceAll(stagingParentPathString,
+ oldTableDir.toString());
+ Path recoveryPath = new Path(recoverPathString);
+ if (!fs.exists(recoveryPath)) {
+ fs.mkdirs(recoveryPath);
+ }
+
+ visitPartitionedDirectory(fs, eachFile.getPath(), outputPath,
stagingParentPathString,
+ renameDirs, oldTableDir);
+ // Find last order partition for renaming
+ String newPathString =
oldPath.toString().replaceAll(stagingParentPathString,
+ outputPath.toString());
+ Path newPath = new Path(newPathString);
+ if (!hasDirectory(fs, eachFile.getPath())) {
--- End diff --
In my view, ```isLeafDirectory()``` would be more intuitive name than
```!hasDirectory()```.
> INSERT OVERWRITE INTO should not remove all partitions.
> -------------------------------------------------------
>
> Key: TAJO-1067
> URL: https://issues.apache.org/jira/browse/TAJO-1067
> Project: Tajo
> Issue Type: Bug
> Components: query master
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Priority: Critical
> Fix For: 0.9.0
>
>
> Currently, INSERT OVERWRITE INTO always moves the result data into the
> original table location. As a result, all existing partitions have been
> removed. The query should not remove all partitions because existing
> partitions may be a dataset for a production cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)