Dan Hecht has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between 
filesystems
......................................................................


Patch Set 22:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 301:     // the file is not accessible until it's closed.
I guess we should clarify this comment further because of the fact that even if 
the file were closed, it would just return the default block size.  How about:

On S3A, the file cannot be stat'ed until after it's closed, and even so, the 
block size reported will be just the filesystem default.  So, remember the 
requested block size.


Line 312: 
nit: remove blank line - this is all one logical block of code.


http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 93: get_block_size
actually, since this is a struct, we can just access block_size. We're already 
doing that for all the other fields.


http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/util/hdfs-util-test.cc
File be/src/util/hdfs-util-test.cc:

Line 60:                                 "temp_dir/temp_path"));
let's add tests for the file scheme case. see other comments for why that's 
special.


http://gerrit.cloudera.org:8080/#/c/2574/22/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

Line 81: getNamenodeLength
static int GetNamenodeLength ... and add a quick function header comment here 
since this isn't declared in a header.  Also, this returns the 
scheme://authority right?  Is that considered the "namenode" or is there a 
better term for that (I thought namenode was effectively a HDFS way of saying 
authority, but I could be wrong)?


Line 86: skip_delimiter
would it make sense to call this delimiter_len?  actually, since you use 
pointers for the other two markers, it might be clearer to do that here to:

// Scheme
after_delimiter = after_scheme + 2;
if (after_delimiter[0] == '/') after_delimiter++;

But really, I think if you see :/ rarther than ://, then the next thing isn't 
the authority. But this should happen only for file (where the / is really part 
of the path).  and I think we need to special case 'file' scheme anyway (see 
below), so we can probably special case that and just make this code skip over 
://


Line 91:   if (after_authority == NULL) return strlen(path);
I think for file scheme, we shouldn't treat the next thing as authority, right? 
i.e. file:/dir1, file://dir2, file:///dir3 are all the same filesystem. I'm 
pretty sure the Hadoop java filesystem resolution code treats these as all the 
same. I think we should just return 4 (i.e. "file") when scheme is file.


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 22
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to