[ 
https://issues.apache.org/jira/browse/HIVE-24235?focusedWorklogId=495904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495904
 ]

ASF GitHub Bot logged work on HIVE-24235:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/20 13:37
            Start Date: 06/Oct/20 13:37
    Worklog Time Spent: 10m 
      Work Description: klcopp opened a new pull request #1558:
URL: https://github.com/apache/hive/pull/1558


   ### What changes were proposed in this pull request?
   Resolve the table again after compaction is finished; compare the id with 
the table id from when compaction began. If the ids do not match, abort the 
compaction's transaction.
   
   ### Why are the changes needed?
   
   ### Does this PR introduce _any_ user-facing change?
   
   ### How was this patch tested?
   Manual tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 495904)
    Remaining Estimate: 0h
            Time Spent: 10m

> Drop and recreate table during MR compaction leaves behind base/delta 
> directory
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-24235
>                 URL: https://issues.apache.org/jira/browse/HIVE-24235
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a table is dropped and recreated during MR compaction, the table directory 
> and a base (or delta, if minor compaction) directory could be created, with 
> or without data, while the table "does not exist".
> E.g.
> {code:java}
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> insert into c values (9);
> insert into c values (9);
> alter table c compact 'major';
> While compaction job is running: {
> drop table c;
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> }
> {code}
> The table directory should be empty, but table directory could look like this 
> after the job is finished:
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
> {code}
> or perhaps just: 
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> {code}
> Insert another row and you have:
> {code:java}
> Oct  6 14:33 base_0000002_v0000101/
> Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
> Oct  6 14:33 base_0000002_v0000101/bucket_00000
> Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
> Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
> Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
> Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
> {code}
> Selecting from the table will result in this error because the highest valid 
> writeId for this table is 1:
> {code:java}
> thrift.ThriftCLIService: Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> ...
> Caused by: java.io.IOException: java.lang.RuntimeException: ORC split 
> generation failed with exception: java.io.IOException: Not enough history 
> available for (1,x).  Oldest available base: 
> .../warehouse/b/base_0000004_v0000092
> {code}
> Solution: Resolve the table again after compaction is finished; compare the 
> id with the table id from when compaction began. If the ids do not match, 
> abort the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to