[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

Cody Koeninger (JIRA) Thu, 13 Oct 2016 18:25:12 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573766#comment-15573766
 ]


Cody Koeninger commented on SPARK-17812:
----------------------------------------

OK, failing on start is clear (it's really annoying in the case of 
subscribePattern), but at least it's clear.  I think that's enough to get 
started on this ticket, is anyone currently working on it or can I do it?  Ryan 
seemed worried that it wouldn't get done in time for the next release.

It sounds like your current plan is to ignore whatever comes out of KAFKA-3370, 
which is fine as long as whatever you do is both clear and equally tunable.  
Clarity of semantics can't be the only criterion of an API, "You can only start 
at latest offset, period" is clear, but a crap api.

{quote}
the only case where we lack sufficient tunability is "Where do I go when the 
current offsets are invalid due to retention?".
{quote}

No, you lack sufficient tunability as to where newly discovered partitions 
start.  Keep in mind that those partitions may have been discovered after a 
significant job downtime.  If the point of an API is to provide clear semantics 
to the user, it is not at all clear to me as a user how I can start those 
partitions at latest, which I know is possible in the underlying data model.

The reason I'm belaboring this point now is that you have chosen names 
(earliest, latest) for the API currently under discussion that are confusingly 
similar to the existing auto offset reset functionality, and you have provided 
knobs for some, but not all, of the things auto offset reset currently affects. 
 This is going to confuse people, it already confuses me.



> More granular control of starting offsets (assign)
> --------------------------------------------------
>
>                 Key: SPARK-17812
>                 URL: https://issues.apache.org/jira/browse/SPARK-17812
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Right now you can only run a Streaming Query starting from either the 
> earliest or latests offsets available at the moment the query is started.  
> Sometimes this is a lot of data.  It would be nice to be able to do the 
> following:
>  - seek to user specified offsets for manually specified topicpartitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

Reply via email to