[jira] [Closed] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists

Chun Chang (JIRA) Thu, 01 Oct 2015 11:27:39 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chun Chang closed DRILL-3209.
-----------------------------
    Assignee: Chun Chang  (was: Jason Altekruse)

The following build supports parquet native scan for Hive. 

{noformat}
+-------------------------------------------+---------------------------------------------------+----------------------------+--------------+----------------------------+
|                 commit_id                 |                  commit_message   
                |        commit_time         | build_email  |         
build_time         |
+-------------------------------------------+---------------------------------------------------+----------------------------+--------------+----------------------------+
| 83ebc7886f1a78e8ccca1a50725a000d3ca928c9  | DRILL-3479: Fix sqlline version 
for all profiles  | 30.09.2015 @ 20:07:11 UTC  | Unknown      | 30.09.2015 @ 
21:30:37 UTC  |
+-------------------------------------------+---------------------------------------------------+----------------------------+--------------+----------------------------+
{noformat}

Using TPCH-100 parquet data, on a 11 node cluster (10.10.103.60-70), verified 
that with native scan turned on, drill can handle tpch query used to oom. Also 
noticed, with native scan turned on, performance may suffer. For some queries, 
performance can be 5-6 times slower.

{noformat}
tpch query      parquet   hive   native scan
19.q                    40s             39s      40s
                        39s             39s      39s
13.q                    35s             50s      345s                   
01.q                    31s             61s      164s                   
04.q                    36s             53s       210s                  
05.q                    42s             oom      110s                   
06.q                    18s            40s       53s
{noformat}

> [Umbrella] Plan reads of Hive tables as native Drill reads when a native 
> reader for the underlying table format exists
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3209
>                 URL: https://issues.apache.org/jira/browse/DRILL-3209
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization, Storage - Hive
>            Reporter: Jason Altekruse
>            Assignee: Chun Chang
>             Fix For: 1.2.0
>
>
> All reads against Hive are currently done through the Hive Serde interface. 
> While this provides the most flexibility, the API is not optimized for 
> maximum performance while reading the data into Drill's native data 
> structures. For Parquet and Text file backed tables, we can plan these reads 
> as Drill native reads. Currently reads of these file types provide untyped 
> data. While parquet has metadata in the file we currently do not make use of 
> the type information while planning. For text files we read all of the files 
> as lists of varchars. In both of these cases, casts will need to be injected 
> to provide the same datatypes provided by the reads through the SerDe 
> interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists

Reply via email to