[jira] [Commented] (HIVE-18810) Parquet Or ORC

Thejas M Nair (JIRA) Tue, 27 Feb 2018 17:38:34 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379624#comment-16379624
 ]


Thejas M Nair commented on HIVE-18810:
--------------------------------------

Please use the mailing list for questions.
You can find some data about ORC , Parquet performance in these slides -
https://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet

In general, reads would be where you see performance improvement. And you would 
see lot more performance improvement when ORC is combined with hive 
vectorization. Use Tez execution engine for more performance, and for even more 
performance improvement use LLAP for execution.


> Parquet Or ORC
> --------------
>
>                 Key: HIVE-18810
>                 URL: https://issues.apache.org/jira/browse/HIVE-18810
>             Project: Hive
>          Issue Type: Test
>          Components: Hive
>    Affects Versions: 1.1.0
>         Environment: Hadoop 1.2.1
> Hive 1.1
>            Reporter: Suddhasatwa Bhaumik
>            Priority: Major
>
> Hello Experts, 
> I would like to know for which data types (based on size and complexity of 
> data) should one be using Parquet or ORC tables in Hive. E.g., On Hadoop 
> 0.20.0 with hive 0.13, the performance of ORC tables in Hive is very good 
> when accessed even by 3rd party BI systems like SAP Business Objects or 
> Tableau; performing the same tests on Hadoop 1.2.1 with Hive 1.1 does not 
> yield such reliability in queries, although ETL or insert/update of tables 
> are taking nominal time the read performance is not within acceptable limits. 
> In case of any queries, kindly advise. 
> Thanks
> [~suddhasatwa_bhaumik]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18810) Parquet Or ORC

Reply via email to