Sailesh Mukil has posted comments on this change. Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 839: There is no directory structure in S3, so "inheriting" permissions is not : // possible. > S3 has per-object permissions (in the form of ACLs see http://docs.aws.amaz That is possible, but since there is no directory structure, we don't have anywhere to inherit permissions from. An alternative way of doing it would be to save the partition's default permissions in a hidden file or something similar (just basically someplace to hold the state), and every time we insert, we "inherit" permissions from this file. Or maybe just stat an existing object and copy over those permissions (this could have other complications though). I'll leave this as a TODO for now and address it in a following patch. Line 875: stringstream ss; : ss << "Error(s) deleting partition directories. First error (of " : << partition_create_ops.errors().size() << ") was: " << err.second; > While you are here, could you clean up he use of stringstream in favour of Done http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.cc File be/src/util/hdfs-bulk-ops.cc: Line 111: .c_str() > not really your change, but I think you can get rid of .c_str() in all the Done Line 128: string error_msg = connection_status.ok() ? GetStrErrMsg() : > put the line break here, and put the whole ternary statement on the next li Done Line 156: if (connection_cache != NULL) connection_cache_ = connection_cache; > You should do this even if connection_cache is NULL - the caller might be t Done http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.h File be/src/util/hdfs-bulk-ops.h: Line 94: /// Constructs a new operation set. > comment is now redundant, can be removed. Done Line 109: /// If abort_on_error is true, execution will finish after the first error seen. > update comment to explain connection_cache Done Line 121: return connection_cache_; > one line Done Line 130: Promise<bool> promise_; > Can you replace this with a CountingBarrier? I think that will be more natu I think the reason a promise was used here instead of a counting barrier was because a CountingBarrier cannot return an error. And we need the promise to notify us of an error if it does happen. I can modify CountingBarrier to maintain its current behavior and optionally return an error if necessary. Should I do that instead? http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.cc File be/src/util/hdfs-util.cc: Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so we compare the : // first 6 characters. Change if we support a filesystem with a smaller scheme. > Comment is out-of-date Done Line 85: // HDFS path. > this isn't a valid assumption after Anuj's change. Once Anuj pushes his change, I'll rebase and submit the final version. (Or he can rebase if I get this patch in first) http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.h File be/src/util/hdfs-util.h: Line 45: IsDfsPath > I think this should be IsHdfsPath(...), because other filesystems technical Done -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
