Henry Robinson has posted comments on this change.

Change subject: IMPALA-3452: S3: Disable Impala staging for INSERTs via flag 
for speedup
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2905/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 52: DEFINE_bool(s3_skip_insert_staging, false, "Enable to skip the staging 
step for INSERTs "
> I  recommend changing to query option.
The idea behind staging and having the coordinator do the final move is to 
allow individual workers to complete their writes before publishing the 
results. 

That means if there are any errors during the query (e.g. scan of a malformed 
file) they are caught before the writes are published. Just having local 
staging doesn't fix this: it's having the coordinator act like a distributed 
barrier that does.

Having this two-stage process also smooths out the effects of skew on the 
'partial write window'.

For S3 this doesn't matter as much because the write latency is so high, the 
partial-write window is very large, so I'm in favour of this option. I think a 
query option is best as well, because the behaviour you want is 
workload-dependent. It would be ok for the option to default to 'true'.


Line 289: so via a flag "blah" ****
fix this


-- 
To view, visit http://gerrit.cloudera.org:8080/2905
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to