[ 
https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418387#comment-16418387
 ] 

B Anil Kumar commented on DRILL-5977:
-------------------------------------

[~aravi5] Thanks for looking into this feature and providing the documentation.

 

Your approach looks good to me. But, just to note, in other storage plugin's 
like Mongo plugin, we are converting the entire filter condition 
expression(combination of all predicates) into Mongo filter. But in the case of 
Kafka, it is not possible to achieve it.

 

So mostly, we might need apply predicate pushdown only in few cases.
 * If predicates are on *kafkaMsgOffset* and/or *kafkaMsgTimestamp*. 
 * If predicates has AND condition with case 1. Example: select * from topic1 
where kafkaMsgTimestamp > x AND (v1='' OR v2 = '') 

And queries like select * from kafkaMsgTimestamp > x OR eventTimeStamp < y  can 
result in full scan.

 

 

> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting 
> point or a count? Perhaps I want to run my query every five minutes, scanning 
> only those messages since the previous scan. Or, I want to limit my take to, 
> say, the next 1000 messages. Could we use a pseudo-column such as 
> "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to