Dan Hecht has posted comments on this change. Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 21: (9 comments) http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 652: block_size On some HDFS implementations I think it's possible for HDFS to override the requested block size, and so this block_size and the one in mBlockSize might not match. So, how about we call this field "requested_block_size" to indicate that this is the block size that was requested, not the true block size of the file? Alternatively, we could move this function's code to CreateNewTmpFile() and then save away the true block size for hdfs. And then just make a OutputPartition::block_size() accessor that the parquet writer could use. That might be simplest, but would add a stat() call on non-parquet paths, but that doesn't seem like it should matter. http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 88: The block size decided on for this file. The block size requested when creating the file. (unless you go with the alternate suggestion). http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/util/hdfs-util-test.cc File be/src/util/hdfs-util-test.cc: Line 44: "hdfs://namenode_2/temp_dir2/temp_path_2")); what about the 'port' cases? http://gerrit.cloudera.org:8080/#/c/2574/21/be/src/util/hdfs-util.cc File be/src/util/hdfs-util.cc: Line 89: after_default_fs_namenode why is this called "after". Isn't this going to point at the default FS namenode? Line 90: after_default_fs_namenode += 3; 3? i'm not sure that works all the time. i think file:/path/to/my/file is legal... Line 105: "://". see above. Line 108: after_authority_a isn't this analogous to after_default_fs_namenode but for path 'a'? If so, they should be named consistently (i.e. namenode vs authority). Line 112: ( extraneous parenthesis Line 115: it seems like this code is parsing three URIs in the same way. It might be clearer to have a helper subroutine that breaks a URI down into it's components that you can call three times. -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 21 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
