[ 
https://issues.apache.org/jira/browse/IMPALA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9515:
--------------------------------------
    Fix Version/s: Impala 4.0

> Milestone 3: Reading “original files”
> -------------------------------------
>
>                 Key: IMPALA-9515
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9515
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-acid
>             Fix For: Impala 4.0
>
>
> “Original files” don’t store special ACID columns, that means we need to 
> auto-generate those values. Actually we only need to auto-generate the record 
> id: (originalTransaction, bucket, rowId).
>  * originalTransaction: can be parsed from the containing directory
>  ** If it’s the table root directory then originalTransaction is 0
>  * Bucket: it’s the bit-packed value of (bucket codec version, bucket id, and 
> statement id)
>  ** Bucket codec version is 1
>  ** Bucket id can be parsed from the filename
>  ** Statement id can be parsed from the delta directory:
>  *** delta_<min_writeid>_<max_writeid>_<statement_id>
>  *** (min_writeid = max_writeid for original files)
>  * rowId: zero-based for each bucket, if there are multiple files in a single 
> bucket:
>  ** List all the files belonging to the bucket
>  ** First file’s first row id is 0
>  ** Next file’s first row id is the row count of the first file
>  ** And so on
> The frontend should generate the base record ID for each file and propagate 
> that information to the scanners. Therefore the scanners would know if they 
> are scanning files in full ACID format or raw format. The ORC scanner needs 
> to be changed in order to generate and fill the ACID columns for original 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to