[
https://issues.apache.org/jira/browse/IMPALA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-9515.
---------------------------------------
Resolution: Fixed
> Milestone 3: Reading “original files”
> -------------------------------------
>
> Key: IMPALA-9515
> URL: https://issues.apache.org/jira/browse/IMPALA-9515
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-acid
>
> “Original files” don’t store special ACID columns, that means we need to
> auto-generate those values. Actually we only need to auto-generate the record
> id: (originalTransaction, bucket, rowId).
> * originalTransaction: can be parsed from the containing directory
> ** If it’s the table root directory then originalTransaction is 0
> * Bucket: it’s the bit-packed value of (bucket codec version, bucket id, and
> statement id)
> ** Bucket codec version is 1
> ** Bucket id can be parsed from the filename
> ** Statement id can be parsed from the delta directory:
> *** delta_<min_writeid>_<max_writeid>_<statement_id>
> *** (min_writeid = max_writeid for original files)
> * rowId: zero-based for each bucket, if there are multiple files in a single
> bucket:
> ** List all the files belonging to the bucket
> ** First file’s first row id is 0
> ** Next file’s first row id is the row count of the first file
> ** And so on
> The frontend should generate the base record ID for each file and propagate
> that information to the scanners. Therefore the scanners would know if they
> are scanning files in full ACID format or raw format. The ORC scanner needs
> to be changed in order to generate and fill the ACID columns for original
> files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)