Henry Robinson has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between 
filesystems
......................................................................


Patch Set 10:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 839: There is no directory structure in S3, so "inheriting" permissions is 
not
        :           // possible.
S3 has per-object permissions (in the form of ACLs see 
http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_ACLs_UsingACLs.html). So 
what's to stop us mimicking inherited permissions? Even though there's no 
directory structure, we could insure that permissions flow down the partition 
hierarchy.

I'm happy to leave this as a TODO, but check my reasoning first.


Line 875: stringstream ss;
        :           ss << "Error(s) deleting partition directories. First error 
(of "
        :              << partition_create_ops.errors().size() << ") was: " << 
err.second;
While you are here, could you clean up he use of stringstream in favour of 
Substitute()?


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.cc
File be/src/util/hdfs-bulk-ops.cc:

Line 111: .c_str()
not really your change, but I think you can get rid of .c_str() in all the 
logging messages


Line 128:     string error_msg = connection_status.ok() ? GetStrErrMsg() :
put the line break here, and put the whole ternary statement on the next line 
if it fits


Line 156:   if (connection_cache != NULL) connection_cache_ = connection_cache;
You should do this even if connection_cache is NULL - the caller might be 
trying to explicitly say "don't use a connection cache".


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.h
File be/src/util/hdfs-bulk-ops.h:

Line 94:   /// Constructs a new operation set.
comment is now redundant, can be removed.


Line 109:   /// If abort_on_error is true, execution will finish after the 
first error seen.
update comment to explain connection_cache


Line 121:     return connection_cache_;
one line


Line 130:   Promise<bool> promise_;
Can you replace this with a CountingBarrier? I think that will be more natural 
and will simplify some of the logic.


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so 
we compare the
       :   // first 6 characters. Change if we support a filesystem with a 
smaller scheme.
Comment is out-of-date


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

Line 45: IsDfsPath
I think this should be IsHdfsPath(...), because other filesystems technically 
implement the DFS abstraction.


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to