Dan Hecht has posted comments on this change. Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 22: (7 comments) http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 301: // the file is not accessible until it's closed. I guess we should clarify this comment further because of the fact that even if the file were closed, it would just return the default block size. How about: On S3A, the file cannot be stat'ed until after it's closed, and even so, the block size reported will be just the filesystem default. So, remember the requested block size. Line 312: nit: remove blank line - this is all one logical block of code. http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 93: get_block_size actually, since this is a struct, we can just access block_size. We're already doing that for all the other fields. http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/util/hdfs-util-test.cc File be/src/util/hdfs-util-test.cc: Line 60: "temp_dir/temp_path")); let's add tests for the file scheme case. see other comments for why that's special. http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/util/hdfs-util.cc File be/src/util/hdfs-util.cc: Line 81: getNamenodeLength static int GetNamenodeLength ... and add a quick function header comment here since this isn't declared in a header. Also, this returns the scheme://authority right? Is that considered the "namenode" or is there a better term for that (I thought namenode was effectively a HDFS way of saying authority, but I could be wrong)? Line 86: skip_delimiter would it make sense to call this delimiter_len? actually, since you use pointers for the other two markers, it might be clearer to do that here to: // Scheme after_delimiter = after_scheme + 2; if (after_delimiter[0] == '/') after_delimiter++; But really, I think if you see :/ rarther than ://, then the next thing isn't the authority. But this should happen only for file (where the / is really part of the path). and I think we need to special case 'file' scheme anyway (see below), so we can probably special case that and just make this code skip over :// Line 91: if (after_authority == NULL) return strlen(path); I think for file scheme, we shouldn't treat the next thing as authority, right? i.e. file:/dir1, file://dir2, file:///dir3 are all the same filesystem. I'm pretty sure the Hadoop java filesystem resolution code treats these as all the same. I think we should just return 4 (i.e. "file") when scheme is file. -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 22 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
