[ 
https://issues.apache.org/jira/browse/TAJO-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155897#comment-14155897
 ] 

ASF GitHub Bot commented on TAJO-1067:
--------------------------------------

Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/161#discussion_r18317824
  
    --- Diff: 
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java ---
    @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) {
                 boolean movedToOldTable = false;
                 boolean committed = false;
                 Path oldTableDir = new Path(queryContext.getStagingDir(), 
TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);
    -            try {
    -              if (fs.exists(finalOutputDir)) {
    -                fs.rename(finalOutputDir, oldTableDir);
    -                movedToOldTable = fs.exists(oldTableDir);
    -              } else { // if the parent does not exist, make its parent 
directory.
    -                fs.mkdirs(finalOutputDir.getParent());
    +
    +            // INSERT OVERWRITE INTO always moves the result data into the 
original table location.
    +            // As a result, all existing partitions have been removed. The 
query should not remove all partitions
    +            // because existing partitions may be a data-set for a 
production cluster.
    +            if (queryContext.hasPartition()) {
    +              Map<Path, Path> renameDirs = TUtil.newHashMap();
    +              Map<Path, Path> recoveryDirs = TUtil.newHashMap();
    +
    +              try {
    +                if (!fs.exists(finalOutputDir)) {
    +                  fs.mkdirs(finalOutputDir);
    +                }
    +
    +                visitPartitionedDirectory(fs, stagingResultDir, 
finalOutputDir, stagingResultDir.toString(),
    +                    renameDirs, oldTableDir);
    +
    +                // Rename target partition directories
    +                for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) {
    +                  // Backup existing data files for recovering
    +                  if (fs.exists(entry.getValue())) {
    +                    String recoveryPathString = 
entry.getValue().toString().replaceAll(finalOutputDir.toString(),
    +                        oldTableDir.toString());
    +                    Path recoveryPath = new Path(recoveryPathString);
    +                    fs.rename(entry.getValue(), recoveryPath);
    +                    fs.exists(recoveryPath);
    +                    recoveryDirs.put(entry.getValue(), recoveryPath);
    +                  }
    +                  // Delete existing directory
    +                  fs.deleteOnExit(entry.getValue());
    +                  // Rename staging directory to final output directory
    +                  fs.rename(entry.getKey(), entry.getValue());
    +                }
    +
    +              } catch (IOException ioe) {
    +                // Remove created dirs
    +                for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) {
    +                  fs.deleteOnExit(entry.getValue());
    --- End diff --
    
    Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.


> INSERT OVERWRITE INTO should not remove all partitions.
> -------------------------------------------------------
>
>                 Key: TAJO-1067
>                 URL: https://issues.apache.org/jira/browse/TAJO-1067
>             Project: Tajo
>          Issue Type: Bug
>          Components: query master
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>            Priority: Critical
>             Fix For: 0.9.0
>
>
> Currently, INSERT OVERWRITE INTO always moves the result data into the 
> original table location. As a result, all existing partitions have been 
> removed. The query should not remove all partitions because existing 
> partitions may be a dataset for a production cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to