[jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long

ASF GitHub Bot (JIRA) Tue, 13 Oct 2015 16:28:29 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955926#comment-14955926
 ]


ASF GitHub Bot commented on DRILL-3921:
---------------------------------------

Github user vkorukanti commented on the pull request:

    https://github.com/apache/drill/pull/197#issuecomment-147882293
  
    Updated patch LGTM. One thing we could test is whether the ScanBatch is 
receiving two RecordReaders, so we are sure that we are lazily initializing the 
second RR, but currently we don't have test framework support to fragment 
verification.


> Hive LIMIT 1 queries take too long
> ----------------------------------
>
>                 Key: DRILL-3921
>                 URL: https://issues.apache.org/jira/browse/DRILL-3921
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Sudheesh Katkam
>            Assignee: Sudheesh Katkam
>
> Fragment initialization on a Hive table (that is backed by a directory of 
> many files) can take really long. This is evident through LIMIT 1 queries. 
> The root cause is that the underlying reader in the HiveRecordReader is 
> initialized when the ctor is called, rather than when setup is called.
> Two changes need to be made:
> 1) lazily initialize the underlying record reader in HiveRecordReader
> 2) allow for running a callable as a proxy user within an operator (through 
> OperatorContext). This is required as initialization of the underlying record 
> reader needs to be done as a proxy user (proxy for owner of the file). 
> Previously, this was handled while creating the record batch tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long

Reply via email to