[ 
https://issues.apache.org/jira/browse/IMPALA-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826895#comment-17826895
 ] 

Wenzhe Zhou commented on IMPALA-12896:
--------------------------------------

Planner.checkForSmallQueryOptimization() use MaxRowsProcessedVisitor to find 
the maxRowsProcessed_ from the nodes in the plan tree.  For DataSourceScanNode, 
its numRows (caller.getInputCardinality()) equals 0, there is no stats and does 
not have simple 'limit' for most queries. So MaxRowsProcessedVisitor.visit() 
set valid_ as false.  This causes Planner to create distributed plan for query 
on JDBC tables.  The merged patch change MaxRowsProcessedVisitor.visit() for 
DataSourceScanNode and estimate numRows as 0. If all the scan nodes are 
DataSourceScanNode, then maxRowsProcessed_ will be determined by non scan 
nodes. It's more likely to make Planner to create one fragment plan.
Should we create distributed plan for queries with join on multiple JDBC 
tables? 

> Avoid JDBC table to be set as transactional table
> -------------------------------------------------
>
>                 Key: IMPALA-12896
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12896
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> Found following issues in downstream integration.
> 1) JDBC tables created in some deployment environment were set as 
> transactional tables by default. This caused catalogd failed to load the 
> metadata for JDBC tables. We have to explicitly set table properties with 
> "transactional=false" for JDBC tables.
> 2) FileSystemUtil.copyFileFromUriToLocal() function wrote log message only 
> for IOException. We should write log message for all types of exceptions so 
> that we can captures errors which caused failures to load JDBC drivers. 
> 3) The operations on JDBC table are processed only on coordinator. The
> processed rows should be estimated as 0 for DataSourceScanNode by planner so 
> that  coordinator-only query plans are generated for simple queries on JDBC 
> tables and queries could be executed without invoking executor nodes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to