[
https://issues.apache.org/jira/browse/HIVE-25521?focusedWorklogId=652977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652977
]
ASF GitHub Bot logged work on HIVE-25521:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Sep/21 13:54
Start Date: 20/Sep/21 13:54
Worklog Time Spent: 10m
Work Description: pgaref commented on a change in pull request #2639:
URL: https://github.com/apache/hive/pull/2639#discussion_r712189151
##########
File path:
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFileStripeMergeRecordReader.java
##########
@@ -63,8 +63,20 @@ public void testSplitStartsWithOffset() throws IOException {
FileSplit split = new FileSplit(tmpPath, offset, length, (String[])null);
OrcFileStripeMergeRecordReader reader = new
OrcFileStripeMergeRecordReader(conf, split);
reader.next(key, value);
+ // since offset is non-zero this file will not be processed.
+ Assert.assertNull(key.getInputPath());
+ split = new FileSplit(tmpPath, 0, length, (String[]) null);
Review comment:
Offset is actually Zero here -- what am I missing?
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileStripeMergeRecordReader.java
##########
@@ -80,7 +80,7 @@ public boolean next(OrcFileKeyWrapper key,
OrcFileValueWrapper value) throws IOE
}
protected boolean nextStripe(OrcFileKeyWrapper keyWrapper,
OrcFileValueWrapper valueWrapper)
- throws IOException {
+ throws IOException {
// missing stripe stats (old format). If numRows is 0 then its an empty
file and no statistics
// is present. We have to differentiate no stats (empty file) vs missing
stats (old format).
if ((stripeStatistics == null || stripeStatistics.isEmpty()) &&
reader.getNumberOfRows() > 0) {
Review comment:
would it make sense to have the ```start > 0``` check here instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 652977)
Remaining Estimate: 0h
Time Spent: 10m
> Data corruption when concatenating files with different compressions in same
> table/partition
> --------------------------------------------------------------------------------------------
>
> Key: HIVE-25521
> URL: https://issues.apache.org/jira/browse/HIVE-25521
> Project: Hive
> Issue Type: Bug
> Reporter: Harish JP
> Assignee: Harish JP
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently if files of different compressions are in same directory then
> concatenate can fail and cause data corruption. This happens because file can
> be moved by one task as incompatible file and the other tasks will fail after
> this.
>
> This issue is addressed in this Jira by only processing a file in one task
> where offset 0 is process and ignoring the the file in all other tasks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)