[
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451193#comment-15451193
]
Lefty Leverenz edited comment on HIVE-14233 at 5/25/17 3:59 AM:
----------------------------------------------------------------
Doc note: This adds the configuration parameter
*hive.transactional.events.mem* to HiveConf.java in release 2.2.0, so it will
need to be documented in the wiki.
* [Configuration Properties -- Transactions and Compactor |
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]
* [Hive Transactions -- New Configuration Parameters for Transactions |
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions]
Added a TODOC2.2 label.
Edit (24/May/17): It's release 2.3.0, not 2.2.0, so I changed the TODOC label.
was (Author: [email protected]):
Doc note: This adds the configuration parameter
*hive.transactional.events.mem* to HiveConf.java in release 2.2.0, so it will
need to be documented in the wiki.
* [Configuration Properties -- Transactions and Compactor |
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]
* [Hive Transactions -- New Configuration Parameters for Transactions |
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions]
Added a TODOC2.2 label.
> Improve vectorization for ACID by eliminating row-by-row stitching
> ------------------------------------------------------------------
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
> Issue Type: New Feature
> Components: Transactions, Vectorization
> Reporter: Saket Saurabh
> Assignee: Saket Saurabh
> Labels: TODOC2.3
> Fix For: 2.3.0
>
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch,
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch,
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch,
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch,
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating
> row-by-row stitching when reading back ACID files. In the current
> implementation, a vectorized row batch is created by populating the batch one
> row at a time, before the vectorized batch is passed up along the operator
> pipeline. This row-by-row stitching limitation was because of the fact that
> the ACID insert/update/delete events from various delta files needed to be
> merged together before the actual version of a given row was found out.
> HIVE-14035 has enabled us to break away from that limitation by splitting
> ACID update events into a combination of delete+insert. In fact, it has now
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier
> bottleneck in the vectorized code path for ACID by now directly reading row
> batches from the underlying ORC files and avoiding any stitching altogether.
> Once a row batch is read from the split (which may be on a base/delta file),
> the deleted rows will be found by cross-referencing them against a data
> structure that will just keep track of deleted events (found in the
> deleted_delta files). This will lead to a large performance gain when reading
> ACID files in vectorized fashion, while enabling further optimizations in
> future that can be done on top of that.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)