Henry Robinson has posted comments on this change. Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 10: (11 comments) http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 839: There is no directory structure in S3, so "inheriting" permissions is not : // possible. S3 has per-object permissions (in the form of ACLs see http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_ACLs_UsingACLs.html). So what's to stop us mimicking inherited permissions? Even though there's no directory structure, we could insure that permissions flow down the partition hierarchy. I'm happy to leave this as a TODO, but check my reasoning first. Line 875: stringstream ss; : ss << "Error(s) deleting partition directories. First error (of " : << partition_create_ops.errors().size() << ") was: " << err.second; While you are here, could you clean up he use of stringstream in favour of Substitute()? http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.cc File be/src/util/hdfs-bulk-ops.cc: Line 111: .c_str() not really your change, but I think you can get rid of .c_str() in all the logging messages Line 128: string error_msg = connection_status.ok() ? GetStrErrMsg() : put the line break here, and put the whole ternary statement on the next line if it fits Line 156: if (connection_cache != NULL) connection_cache_ = connection_cache; You should do this even if connection_cache is NULL - the caller might be trying to explicitly say "don't use a connection cache". http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.h File be/src/util/hdfs-bulk-ops.h: Line 94: /// Constructs a new operation set. comment is now redundant, can be removed. Line 109: /// If abort_on_error is true, execution will finish after the first error seen. update comment to explain connection_cache Line 121: return connection_cache_; one line Line 130: Promise<bool> promise_; Can you replace this with a CountingBarrier? I think that will be more natural and will simplify some of the logic. http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.cc File be/src/util/hdfs-util.cc: Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so we compare the : // first 6 characters. Change if we support a filesystem with a smaller scheme. Comment is out-of-date http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.h File be/src/util/hdfs-util.h: Line 45: IsDfsPath I think this should be IsHdfsPath(...), because other filesystems technically implement the DFS abstraction. -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
