Henry Robinson has posted comments on this change. Change subject: IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/2905/1/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 52: DEFINE_bool(s3_skip_insert_staging, false, "Enable to skip the staging step for INSERTs " > I recommend changing to query option. The idea behind staging and having the coordinator do the final move is to allow individual workers to complete their writes before publishing the results. That means if there are any errors during the query (e.g. scan of a malformed file) they are caught before the writes are published. Just having local staging doesn't fix this: it's having the coordinator act like a distributed barrier that does. Having this two-stage process also smooths out the effects of skew on the 'partial write window'. For S3 this doesn't matter as much because the write latency is so high, the partial-write window is very large, so I'm in favour of this option. I think a query option is best as well, because the behaviour you want is workload-dependent. It would be ok for the option to default to 'true'. Line 289: so via a flag "blah" **** fix this -- To view, visit http://gerrit.cloudera.org:8080/2905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
