Gopal V created HIVE-13161: ------------------------------ Summary: ORC: Always do sloppy overlaps for DiskRanges Key: HIVE-13161 URL: https://issues.apache.org/jira/browse/HIVE-13161 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.1.0 Reporter: Gopal V Assignee: Prasanth Jayachandran
The selected columns are sometimes only a few bytes apart (particularly for nulls which compresses tightly) and the reads aren't merged The WORST_UNCOMPRESSED_SLOP is only applied in the PPD case and is applied more for safety than reducing total number of round-trip calls to filesystem. {code} /** * Update the disk ranges to collapse adjacent or overlapping ranges. It * assumes that the ranges are sorted. * @param ranges the list of disk ranges to merge */ static void mergeDiskRanges(List<DiskRange> ranges) { DiskRange prev = null; for(int i=0; i < ranges.size(); ++i) { DiskRange current = ranges.get(i); if (prev != null && overlap(prev.offset, prev.end, current.offset, current.end)) { prev.offset = Math.min(prev.offset, current.offset); prev.end = Math.max(prev.end, current.end); ranges.remove(i); i -= 1; } else { prev = current; } } } ... private static boolean overlap(long leftA, long rightA, long leftB, long rightB) { if (leftA <= leftB) { return rightA >= leftB; } return rightB >= leftA; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)