Thomas Poepping updated HIVE-22928:
    Attachment: HIVE-22928.3.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -----------------------------------------------------------------
>                 Key: HIVE-22928
>                 URL: https://issues.apache.org/jira/browse/HIVE-22928
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration, Hive
>    Affects Versions: 3.1.2
>            Reporter: Thomas Poepping
>            Assignee: Thomas Poepping
>            Priority: Minor
>         Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, HIVE-22928.patch
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.

This message was sent by Atlassian Jira

Reply via email to