Dan Hecht has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between 
filesystems
......................................................................


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 649:   // that file, it still does not "exist" on S3 and hdfsGetPathInfo() 
will return an error
this comment is kind of confusing because it explains why running this code is 
okay first, and not until later does it explain why the special case is needed 
at all.   How about rephrasing as below (note however, that it's possible that 
one day S3 will have per-file block sizes, and even variable block sizes):

For S3A, when creating files, hdfsGetPathInfo() cannot be used until the file 
is closed. Since S3A is really an object store, all files have the same "block" 
size, which is the same as the default block size.


http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/util/hdfs-bulk-ops.cc
File be/src/util/hdfs-bulk-ops.cc:

Line 154:   connection_cache_ = connection_cache;
if we're going to stash this as a member variable, why not just pass it during 
construction? does it change between Execute() calls?


http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

Line 91:   return strncmp(path_a, path_b, scheme_a_len) == 0;
it needs to also check the namenode (or bucket, etc). i.e.:

hdfs://namenode0:8020/my/file
hdfs://namenode1:8020/other/file
hdfs://namenode0:8021/another/file

are all different filesystems. And you won't be able to do that without knowing 
the defaultfs (i.e. Anuj's patch).


http://gerrit.cloudera.org:8080/#/c/2574/13/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 400:  URI with the scheme 
I think a more standard way to say it is: "Fully qualified URI".

But how do we guarantee that this is actually fully qualified?


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to