[
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-15664:
------------------------------------
Attachment: HIVE-15664.WIP.patch
This implements 1-2, as well as ORC dictionary.
Skipping is only supported on VectorDeserialize; I started looking at it,
should be easy to do after clearing the initial confusiong - VD doesn't support
complex types anyway, so should be easy to map new ORC cols to original column
indexes.
We don't expect that to result in major gain though (compared to 1-2-4), so I
postponed it for now.
Unfortunately 1 and 2 don't speed it up enough... need to do 4 - return VRBs
from VectorDeserialize, and offload ORC writing to a background thread, I was
looking into that today. Need to wrap my head around variety of array indexes
and integer lists that various parts use. Also interface-wise it would be
difficult. Will probably piggyback on Orc...Batch
> LLAP text cache: improve first query perf
> -----------------------------------------
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Attachments: HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> 4) Send VRB to the pipeline and write ORC in parallel (in background).
> Also add an option to disable the encoding pipeline server-side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)