[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451193#comment-15451193
 ] 

Lefty Leverenz edited comment on HIVE-14233 at 5/25/17 3:59 AM:
----------------------------------------------------------------

Doc note:  This adds the configuration parameter 
*hive.transactional.events.mem* to HiveConf.java in release 2.2.0, so it will 
need to be documented in the wiki.

* [Configuration Properties -- Transactions and Compactor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]
* [Hive Transactions -- New Configuration Parameters for Transactions | 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions]

Added a TODOC2.2 label.

Edit (24/May/17):  It's release 2.3.0, not 2.2.0, so I changed the TODOC label.


was (Author: [email protected]):
Doc note:  This adds the configuration parameter 
*hive.transactional.events.mem* to HiveConf.java in release 2.2.0, so it will 
need to be documented in the wiki.

* [Configuration Properties -- Transactions and Compactor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]
* [Hive Transactions -- New Configuration Parameters for Transactions | 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions]

Added a TODOC2.2 label.

> Improve vectorization for ACID by eliminating row-by-row stitching
> ------------------------------------------------------------------
>
>                 Key: HIVE-14233
>                 URL: https://issues.apache.org/jira/browse/HIVE-14233
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions, Vectorization
>            Reporter: Saket Saurabh
>            Assignee: Saket Saurabh
>              Labels: TODOC2.3
>             Fix For: 2.3.0
>
>         Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, 
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to