[
https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou updated IMPALA-12377:
---------------------------------
Description: The code to handle count(*) query in backend function
DataSourceScanNode::GetNext() are not efficient. Even there are no column data
returned from external data source, it still try to materialize rows and add
rows to RowBatch one by one up to the number of row count. It also call
GetNextInputBatch() multiple times (count / batch_size), while
GetNextInputBatch() invoke JNI function. (was: The code to handle 'select
count(*)' in backend function DataSourceScanNode::GetNext() are not efficient.
Even there are no column data returned from external data source, it still try
to materialize rows and add rows to RowBatch one by one up to the number of row
count. It also call GetNextInputBatch() multiple times (count / batch_size),
while GetNextInputBatch() invoke JNI function. )
> Improve count star performance for external data source
> -------------------------------------------------------
>
> Key: IMPALA-12377
> URL: https://issues.apache.org/jira/browse/IMPALA-12377
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend, Frontend
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> The code to handle count(*) query in backend function
> DataSourceScanNode::GetNext() are not efficient. Even there are no column
> data returned from external data source, it still try to materialize rows and
> add rows to RowBatch one by one up to the number of row count. It also call
> GetNextInputBatch() multiple times (count / batch_size), while
> GetNextInputBatch() invoke JNI function.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]