Dan Hecht has posted comments on this change. Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 649: // that file, it still does not "exist" on S3 and hdfsGetPathInfo() will return an error this comment is kind of confusing because it explains why running this code is okay first, and not until later does it explain why the special case is needed at all. How about rephrasing as below (note however, that it's possible that one day S3 will have per-file block sizes, and even variable block sizes): For S3A, when creating files, hdfsGetPathInfo() cannot be used until the file is closed. Since S3A is really an object store, all files have the same "block" size, which is the same as the default block size. http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/util/hdfs-bulk-ops.cc File be/src/util/hdfs-bulk-ops.cc: Line 154: connection_cache_ = connection_cache; if we're going to stash this as a member variable, why not just pass it during construction? does it change between Execute() calls? http://gerrit.cloudera.org:8080/#/c/2574/13/be/src/util/hdfs-util.cc File be/src/util/hdfs-util.cc: Line 91: return strncmp(path_a, path_b, scheme_a_len) == 0; it needs to also check the namenode (or bucket, etc). i.e.: hdfs://namenode0:8020/my/file hdfs://namenode1:8020/other/file hdfs://namenode0:8021/another/file are all different filesystems. And you won't be able to do that without knowing the defaultfs (i.e. Anuj's patch). http://gerrit.cloudera.org:8080/#/c/2574/13/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 400: URI with the scheme I think a more standard way to say it is: "Fully qualified URI". But how do we guarantee that this is actually fully qualified? -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 13 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
