[ 
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573395#comment-15573395
 ] 

Cody Koeninger edited comment on SPARK-17812 at 10/13/16 10:25 PM:
-------------------------------------------------------------------

1. we dont have lists, we have strings.  regexes and valid topic names have 
overlaps (dot is the obvious one).

2. Mapping directly to kafka method names means we don't have to come up with 
some other (weird and possibly overlapping) name when they add more ways to 
subscribe, we just use theirs.

3. I think this "starting X mssages" is a mess with kafka semantics for the 
reasons both you and I have already expressed.  At any rate, I think Michael 
already clearly punted the "starting X messages" case to a different ticket.

4. I  think it's more than sufficiently clear as suggested, no one is going to 
expect that a specific offset they provided is going to be overruled by a 
general single default.   The implementation is also crystal clear - seek to 
the position identified by startingTime, then seek to any specific offsets for 
specific partitions

Yes, this is all bikeshedding, but it's bikeshedding that directly affects what 
people are actually able to do with the api.  Needlessly restricting it for 
reasons that have nothing to do with safety is just going to piss users off for 
no reason. Just because you don't have a use case that needs it, doesn't mean 
you should arbitrarily prevent users from doing it.

Please, just choose something and let me build it so that people can actually 
use the thing by the next release....


was (Author: c...@koeninger.org):
1. we dont have lists, we have strings.  regexes and valid topic names have 
overlaps (dot is the obvious one).

2. Mapping directly to kafka method names means we don't have to come up with 
some other (weird and possibly overlapping) name when they add more ways to 
subscribe, we just use theirs.

3. I think this is a mess with kafka semantics for the reasons both you and I 
have already expressed.  At any rate, I think Michael already clearly punted 
the "starting X" case to a different topic.

4. I  think it's more than sufficiently clear as suggested, no one is going to 
expect that a specific offset they provided is going to be overruled by a 
general single default.   The implementation is also crystal clear - seek to 
the position identified by startingTime, then seek to any specific offsets for 
specific partitions

Yes, this is all bikeshedding, but it's bikeshedding that directly affects what 
people are actually able to do with the api.  Needlessly restricting it for 
reasons that have nothing to do with safety is just going to piss users off for 
no reason. Just because you don't have a use case that needs it, doesn't mean 
you should arbitrarily prevent users from doing it.

Please, just choose something and let me build it so that people can actually 
use the thing by the next release....

> More granular control of starting offsets (assign)
> --------------------------------------------------
>
>                 Key: SPARK-17812
>                 URL: https://issues.apache.org/jira/browse/SPARK-17812
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Right now you can only run a Streaming Query starting from either the 
> earliest or latests offsets available at the moment the query is started.  
> Sometimes this is a lot of data.  It would be nice to be able to do the 
> following:
>  - seek to user specified offsets for manually specified topicpartitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to