[
https://issues.apache.org/jira/browse/HIVE-25915?focusedWorklogId=720851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720851
]
ASF GitHub Bot logged work on HIVE-25915:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Feb/22 12:54
Start Date: 04/Feb/22 12:54
Worklog Time Spent: 10m
Work Description: veghlaci05 opened a new pull request #3000:
URL: https://github.com/apache/hive/pull/3000
### What changes were proposed in this pull request?
Minor compaction is not possible on table with no-acid data (either in delta
or original files).
This PR prevents executing a minor compaction on tables matching the
criteria above. Furthermore,
for these tables, the initiator will submit a MAJOR compaction instead of
MINOR.
### Why are the changes needed?
Executing MINOR compactions on tables with non-acid data will result in an
NPE, when trying to apply the acid schema on a non-acid row.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Through automated tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 720851)
Remaining Estimate: 0h
Time Spent: 10m
> Query based MINOR compaction fails with NPE if the data is loaded into the
> ACID table
> -------------------------------------------------------------------------------------
>
> Key: HIVE-25915
> URL: https://issues.apache.org/jira/browse/HIVE-25915
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: László Végh
> Assignee: László Végh
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Steps to reproduce:
> # Create a table with import:
> {{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10
> BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}
> # {{insert into temp_acid values
> ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
> {{export table temp_acid to '/tmp/temp_acid';}}
> {{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}
> # Do some inserts:
> {{insert into imported values ('21', 'value21'),('84', 'value84'),('66',
> 'value66'),('54', 'value54');
> insert into imported values ('22', 'value22'),('34', 'value34'),('35',
> 'value35');
> insert into imported values ('75', 'value75'),('99', 'value99');}}
> # {{Run a minor compaction}}
> If the data is loaded or imported into the table they way it is described
> above, the rows in the ORC file don't contain the ACID metadata. The
> query-based MINOR compaction fails on this kind of table, because when the
> FileSinkOperator tries to read out the bucket metadata from the rows it will
> throw a NPE. But deleting and updating a table like this is possible. So
> somehow the bucketId can be calculated for rows like this.
> The non-query based MINOR compaction works fine on a table like this.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)