Dan Hecht has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between 
filesystems
......................................................................


Patch Set 21:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 652: block_size
On some HDFS implementations I think it's possible for HDFS to override the 
requested block size, and so this block_size and the one in mBlockSize might 
not match. So, how about we call this field "requested_block_size" to indicate 
that this is the block size that was requested, not the true block size of the 
file?

Alternatively, we could move this function's code to CreateNewTmpFile() and 
then save away the true block size for hdfs.  And then just make a 
OutputPartition::block_size() accessor that the parquet writer could use.  That 
might be simplest, but would add a stat() call on non-parquet paths, but that 
doesn't seem like it should matter.


http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 88: The block size decided on for this file.
The block size requested when creating the file.  (unless you go with the 
alternate suggestion).


http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/util/hdfs-util-test.cc
File be/src/util/hdfs-util-test.cc:

Line 44:                                 
"hdfs://namenode_2/temp_dir2/temp_path_2"));
what about the 'port' cases?


http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

Line 89: after_default_fs_namenode
why is this called "after". Isn't this going to point at the default FS 
namenode?


Line 90:   after_default_fs_namenode += 3;
3? i'm not sure that works all the time. i think file:/path/to/my/file is 
legal...


Line 105: "://".
see above.


Line 108: after_authority_a
isn't this analogous to after_default_fs_namenode but for path 'a'?  If so, 
they should be named consistently (i.e. namenode vs authority).


Line 112: (
extraneous parenthesis


Line 115: 
it seems like this code is parsing three URIs in the same way.  It might be 
clearer to have a helper subroutine that breaks a URI down into it's components 
that you can call three times.


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 21
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to