Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING ......................................................................
Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/3146/7/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 298: RETURN_IF_ERROR(HdfsFsCache::instance()->GetConnection( > it's not obvious why this is correct. What's wrong with the old location, The problem arises from this. In BuildHdfsFileNames(), the temporary file and the final file names get their schemes from different sources: tmp_hdfs_file_name_prefix: https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L258 where staging_dir_ get's the scheme from the base dir: https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L137 If we explicitly specify a partition location, final_hdfs_file_name_prefix: https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L271 So, the tmp file name gets the location from the base table and the final file name gets the location from the user specified location (if specified). However, in previous patchsets, we got the connection at L389 (just after the call to BuildHdfsFileNames()). And we can only either get it based on tmp_hdfs_file_name_prefix or final_hdfs_file_name_prefix. If we choose to get a connection to 'tmp_hdfs_file_name_prefix' for a table on HDFS with a partition on S3, and skip insert staging for S3, we will be trying to write to S3 with a connection to HDFS. (Because tmp_hdfs_file_name_prefix always points to the base table.) If we choose to get a connection to 'final_hdfs_file_name_prefix' for a table on HDFS with a partition on S3, and we do not skip insert staging for S3, we will be trying to write to HDFS (as the staging dir will be on HDFS) with a connection to S3. So the only option I saw was to get the connection to "current_file_name" as that is the final file we end up writing to. -- To view, visit http://gerrit.cloudera.org:8080/3146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.6.0_5.8.0 Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
