Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems 
breaks with S3_SKIP_INSERT_STAGING
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3146/7/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 298:   RETURN_IF_ERROR(HdfsFsCache::instance()->GetConnection(
> it's not obvious why this is correct.  What's wrong with the old location, 
The problem arises from this. In BuildHdfsFileNames(), the temporary file and 
the final file names get their schemes from different sources:

tmp_hdfs_file_name_prefix:
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L258

where staging_dir_ get's the scheme from the base dir:
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L137

If we explicitly specify a partition location, final_hdfs_file_name_prefix:
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-table-sink.cc#L271


So, the tmp file name gets the location from the base table and the final file 
name gets the location from the user specified location (if specified).

However, in previous patchsets, we got the connection at L389 (just after the 
call to BuildHdfsFileNames()). And we can only either get it based on 
tmp_hdfs_file_name_prefix or final_hdfs_file_name_prefix.

If we choose to get a connection to 'tmp_hdfs_file_name_prefix' for a table on 
HDFS with a partition on S3, and skip insert staging for S3, we will be trying 
to write to S3 with a connection to HDFS. (Because tmp_hdfs_file_name_prefix 
always points to the base table.)

If we choose to get a connection to 'final_hdfs_file_name_prefix' for a table 
on HDFS with a partition on S3, and we do not skip insert staging for S3, we 
will be trying to write to HDFS (as the staging dir will be on HDFS) with a 
connection to S3.

So the only option I saw was to get the connection to "current_file_name" as 
that is the final file we end up writing to.


-- 
To view, visit http://gerrit.cloudera.org:8080/3146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to