[
https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488501#comment-16488501
]
ASF GitHub Bot commented on DRILL-5977:
---------------------------------------
akumarb2010 commented on issue #1272: DRILL-5977: Filter Pushdown in
Drill-Kafka plugin
URL: https://github.com/apache/drill/pull/1272#issuecomment-391601936
>> Did you mean that we do not apply predicate pushdown for such conditions?
Yes, IMO the users who is applying these predicates on offsets, should be
aware of offset scope per partition. So, in such cases where offset predicates
without partitionId can be ignored and let it be full scan as these are in
valid queries in kafka perspective. Please let me know your thoughts.
>> If we do not pushdown, all partitions will be scanned from startOffset to
endOffset.
Yes, but actually this kind queries itself are in valid except for topics
with one partition. So better to not consider them.
>> Are there any drawbacks of applying pushdown on such conditions?
These are not drawbacks, but my point is let's not handle this invalid
queries.
>> we can use this predicate pushdown feature for external checkpointing
mechanism.
Mostly timestamp based predicates will be useful for the users. Only use
case which i can see , for offset based predicates will be for checkpointing
and all the check pointing strategies used in various systems like Storm/Spark
streaming/Camus etc will be checkpointing on offset per partition.
>> Even with pushdown we will return empty results. Pushdown is applied to
each of the predicates
independently and merged. The implementation is such that it ensures
offsetsForTimes is called only for valid partitions (i.e., partitions returned
by partitionsFor). Hence we will not run into the situation where
offsetsForTimes blocks infinitely.
Thanks for clarifying it. Good to know that you are already handling these
cases.
>> I will add test case for this situation. (I have already added cases for
predicates with invalid offsets and timestamps.)
Thanks @aravi5
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
> Key: DRILL-5977
> URL: https://issues.apache.org/jira/browse/DRILL-5977
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: B Anil Kumar
> Assignee: Abhishek Ravi
> Priority: Major
> Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting
> point or a count? Perhaps I want to run my query every five minutes, scanning
> only those messages since the previous scan. Or, I want to limit my take to,
> say, the next 1000 messages. Could we use a pseudo-column such as
> "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)