[ 
https://issues.apache.org/jira/browse/PIG-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802217#comment-15802217
 ] 

Sushanth Sowmyan commented on PIG-5079:
---------------------------------------

Yup, looking through code and some history, it looks like overwrite is not 
supported by HCatalog, and requires explicit drop-and-recreate. I was wrong 
earlier.

With the append addition to HCatalog, hcatalog writes started being similar to 
hive INSERT-INTO, which then caused appends instead of failures in cases where 
there was data already present, which breaks workflow for some people that 
expected hcatalog to fail in those cases, and were using that to catch 
themselves from error scenarios. Thus, the "immutable"="true" was introduced to 
make it so that people could continue depending on hcatalog to error out if 
there was already data. To be consistent, hive INSERT-INTO behaviour was also 
changed so that it would fail if we already have data. Thus, this does not help 
[~reddyppr] in his scenario.

Thus, an INSERT-OVERWRITE analogue for HCatalog would currently be a feature 
request. A useful one to be sure.

To be clear, the behaviour is now as follows:

a) By default, hcatalog will attempt to treat all writes as INSERT-INTO with 
appending.
b) If the table has "immutable"="true", then append is disabled, and thus, if 
the table/ptn has no data, then the insert works, but if it has data, then the 
job fails.
c) Appending is not supported with dynamic partitioning writes, so if a user 
uses dynamic partitioning writes, then the presence of data in one of the 
partitions written to by the dyn part job will result in a runtime failure.

> HCatalog - overwrite
> --------------------
>
>                 Key: PIG-5079
>                 URL: https://issues.apache.org/jira/browse/PIG-5079
>             Project: Pig
>          Issue Type: Improvement
>          Components: build, data
>    Affects Versions: 0.15.0
>            Reporter: Praveen PentaReddy
>            Priority: Minor
>
> In HCatalog, i am using hive table and doing transformation and after 
> completing the transformation i want to overwrite the data back to the same 
> hive table.however, while writing back the data to the same table, the 
> transformed data is getting appended rather not getting overwrite.
> It would be good if we have option like overwrite so that if need we can use 
> overwrite or append as we have this kind of feature in hive while importing 
> the data into a hive table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to