Sailesh Mukil has uploaded a new patch set (#3). Change subject: IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup ......................................................................
IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup INSERTs on S3 are slower because of double buffering where we buffer once locally and once in a staging directoy in S3 before moving the file(s) to the final location. Also, moving the file from the staging directory to the final location in HDFS is a quick rename which is only a metadata operation. However, on S3, renames are not supported, thus becoming a full file copy instead of just a metadata rename operation. This patch instroduces a boolean query option "s3_skip_insert_staging" which avoids the staging step on S3 and allows the sinks to write to the final location directly. This trades in consistency for the sake of performance. If a node(s) fails during the query, then we will end up with inconsistent results in the final location. P.S: This option is disabled for INSERT OVERWRITE queries as that would require cleaning the destination directory before moving the final files there. However, the coordinator is responsible for the cleaning which takes place only after the table sinks have moved the files to the final location. Thus, INSERT OVERWRITE queries must still have their files moved to a staging location by the table sinks. TODO: Record average performance gains here. Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 6 files changed, 60 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/05/2905/3 -- To view, visit http://gerrit.cloudera.org:8080/2905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]>
