[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013349#comment-14013349
 ] 

Prasanth J commented on HIVE-7052:
----------------------------------

+1

> Optimize split calculation time
> -------------------------------
>
>                 Key: HIVE-7052
>                 URL: https://issues.apache.org/jira/browse/HIVE-7052
>             Project: Hive
>          Issue Type: Bug
>         Environment: hive + tez
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>              Labels: performance
>         Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
> HIVE-7052-v3.patch, HIVE-7052-v7.patch
>
>
> When running a TPC-DS query (query_27),  significant amount of time was spent 
> in split computation on a dataset of size 200 GB (ORC format).
> Profiling revealed that, 
> 1. Lot of time was spent in Config's subtitutevar (regex) in 
> HiveInputFormat.getSplits() method.  
> 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
> I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to