[ 
https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488501#comment-16488501
 ] 

ASF GitHub Bot commented on DRILL-5977:
---------------------------------------

akumarb2010 commented on issue #1272: DRILL-5977: Filter Pushdown in 
Drill-Kafka plugin
URL: https://github.com/apache/drill/pull/1272#issuecomment-391601936
 
 
   >> Did you mean that we do not apply predicate pushdown for such conditions?
   
   Yes, IMO the users who is applying these predicates on offsets, should be 
aware of offset scope per partition. So, in such cases where offset predicates 
without partitionId can be ignored and let it be full scan as these are in 
valid queries in kafka perspective. Please let me know your thoughts.
   
   >> If we do not pushdown, all partitions will be scanned from startOffset to 
endOffset.
   
   Yes, but actually this kind queries itself are in valid except for topics 
with one partition. So better to not consider them.
   
   >> Are there any drawbacks of applying pushdown on such conditions?
   
   These are not drawbacks, but my point is let's not handle this invalid 
queries.
   
   >> we can use this predicate pushdown feature for external checkpointing 
mechanism.
   
   Mostly timestamp based predicates will be useful for the users. Only use 
case which i can see , for offset based predicates will be for checkpointing 
and all the check pointing strategies used in various systems like Storm/Spark 
streaming/Camus etc will be checkpointing on offset per partition. 
   
   
   >> Even with pushdown we will return empty results. Pushdown is applied to 
each of the predicates 
   independently and merged. The implementation is such that it ensures 
offsetsForTimes is called only for valid partitions (i.e., partitions returned 
by partitionsFor). Hence we will not run into the situation where 
offsetsForTimes blocks infinitely.
   
   Thanks for clarifying it. Good to know that you are already handling these 
cases.
   
   >> I will add test case for this situation. (I have already added cases for 
predicates with invalid offsets and timestamps.)
   
   Thanks  @aravi5 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Abhishek Ravi
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting 
> point or a count? Perhaps I want to run my query every five minutes, scanning 
> only those messages since the previous scan. Or, I want to limit my take to, 
> say, the next 1000 messages. Could we use a pseudo-column such as 
> "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to