[
https://issues.apache.org/jira/browse/PIG-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802217#comment-15802217
]
Sushanth Sowmyan commented on PIG-5079:
---------------------------------------
Yup, looking through code and some history, it looks like overwrite is not
supported by HCatalog, and requires explicit drop-and-recreate. I was wrong
earlier.
With the append addition to HCatalog, hcatalog writes started being similar to
hive INSERT-INTO, which then caused appends instead of failures in cases where
there was data already present, which breaks workflow for some people that
expected hcatalog to fail in those cases, and were using that to catch
themselves from error scenarios. Thus, the "immutable"="true" was introduced to
make it so that people could continue depending on hcatalog to error out if
there was already data. To be consistent, hive INSERT-INTO behaviour was also
changed so that it would fail if we already have data. Thus, this does not help
[~reddyppr] in his scenario.
Thus, an INSERT-OVERWRITE analogue for HCatalog would currently be a feature
request. A useful one to be sure.
To be clear, the behaviour is now as follows:
a) By default, hcatalog will attempt to treat all writes as INSERT-INTO with
appending.
b) If the table has "immutable"="true", then append is disabled, and thus, if
the table/ptn has no data, then the insert works, but if it has data, then the
job fails.
c) Appending is not supported with dynamic partitioning writes, so if a user
uses dynamic partitioning writes, then the presence of data in one of the
partitions written to by the dyn part job will result in a runtime failure.
> HCatalog - overwrite
> --------------------
>
> Key: PIG-5079
> URL: https://issues.apache.org/jira/browse/PIG-5079
> Project: Pig
> Issue Type: Improvement
> Components: build, data
> Affects Versions: 0.15.0
> Reporter: Praveen PentaReddy
> Priority: Minor
>
> In HCatalog, i am using hive table and doing transformation and after
> completing the transformation i want to overwrite the data back to the same
> hive table.however, while writing back the data to the same table, the
> transformed data is getting appended rather not getting overwrite.
> It would be good if we have option like overwrite so that if need we can use
> overwrite or append as we have this kind of feature in hive while importing
> the data into a hive table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)