GitHub user alanfgates opened a pull request:
https://github.com/apache/orc/pull/179
ORC-255
This is not ready for commit. I'm just putting it up so people can start
looking at it and giving feedback.
As noted in the JIRA, this only deals with ACID2 and the vector batch
interface.
This depends on an unreleased version of Hive's storage-api. It also fails
when running TestRecordReaderImpl due to changes in storage-api's DiskRangeList.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/alanfgates/orc orc255
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/orc/pull/179.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #179
----
commit 96026a342bf531c9c12b3cc8a127f33026cba6b9
Author: Alan Gates <[email protected]>
Date: 2017-09-15T18:40:01Z
WIP Ported parsing parts of Hive's AcidUtils into AcidDirectoryParser and
supporting classes. Haven't finished the testing yet.
commit 12477e216caee814fd3c6545a3a7c938d54369b8
Author: Alan Gates <[email protected]>
Date: 2017-09-26T22:57:50Z
Finished testing AcidDirectoryParser.
commit 096072c6c6628f1bcee4ec931ec785e136e11e23
Author: Alan Gates <[email protected]>
Date: 2017-09-27T00:15:28Z
Changed AcidVersionedDirectory to track txn information for files in
addition to just FileStatus.
commit df66d52047938c239948bd04559c95d4fcac2227
Author: Alan Gates <[email protected]>
Date: 2017-09-27T22:45:46Z
Moved AcidVersionedDirectory to ParsedAcidDirectory to better fit with
terminology of AcidDirectoryParser and ParsedAcidFile. Added ability to
determine whether a given input file from the directory should be read and to
determine which delete deltas to use for a given input file. Fixed a number of
bugs I found along the way.
commit 111e1308a0cbf79862e80c97f5c6ca9c78b38273
Author: Alan Gates <[email protected]>
Date: 2017-09-28T20:00:12Z
Added ability to read insert files (base and normal delta). Haven't yet
done delete files.
commit b8a7e6d7da40e83d193140d565463caf83379ee1
Author: Alan Gates <[email protected]>
Date: 2017-09-30T00:59:09Z
WIP, wrote the initial code for handling the deletes. Haven't tested it
yet.
commit 9146dd6020d63694e0b5773b2f092c102e78b0da
Author: Alan Gates <[email protected]>
Date: 2017-10-03T19:54:09Z
Fixed a bunch of errors in delete handling. Added unit tests for delete
testing.
commit 90ff039b83c2a198b5b7117b8c554c989a374af7
Author: Alan Gates <[email protected]>
Date: 2017-10-04T23:37:25Z
Went overboard on caching delete sets. I'm going to simplify this a bunch
and remove the caching. But checking in now in case I change my mind and
decide to go back to the caching.
commit acaabe6272e57e2bce0c9af5f74d61a2e1510709
Author: Alan Gates <[email protected]>
Date: 2017-10-05T00:53:30Z
Simplified delete sets to be attached to a ParsedAcidDirectory instead of
trying to cache them. That leaves it up to the user to make sure there aren't
too many ParsedAcidDirectories live in a process, each with its own DeleteSet.
commit ed77b1e89a390c2c451b821a84f4a76595ad3cda
Author: Alan Gates <[email protected]>
Date: 2017-10-12T00:04:23Z
Most likely useless changes. I don't think I need the
MergingAcidRecordReader. But keeping it for now in case I turn out to be
wrong. It has happened before.
----
---