[jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf

Sergey Shelukhin (JIRA) Wed, 18 Jan 2017 20:00:46 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sergey Shelukhin updated HIVE-15664:
------------------------------------
    Attachment: HIVE-15664.WIP.patch

This implements 1-2, as well as ORC dictionary.
Skipping is only supported on VectorDeserialize; I started looking at it, 
should be easy to do after clearing the initial confusiong - VD doesn't support 
complex types anyway, so should be easy to map new ORC cols to original column 
indexes. 
We don't expect that to result in major gain though (compared to 1-2-4), so I 
postponed it for now.
Unfortunately 1 and 2 don't speed it up enough... need to do 4 - return VRBs 
from VectorDeserialize, and offload ORC writing to a background thread, I was 
looking into that today. Need to wrap my head around variety of array indexes 
and integer lists that various parts use. Also interface-wise it would be 
difficult. Will probably piggyback on Orc...Batch

> LLAP text cache: improve first query perf
> -----------------------------------------
>
>                 Key: HIVE-15664
>                 URL: https://issues.apache.org/jira/browse/HIVE-15664
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>         Attachments: HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> 4) Send VRB to the pipeline and write ORC in parallel (in background).
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf

Reply via email to