Michael Armbrust commented on SPARK-17812:

Please do work on it.  It might be good to update the description with a 
summary of this discussion so we can all be sure we are on the same page.

I actually do think its fair to have one configuration for what to do in the 
case of data loss.  This happens when you fall behind or when you come back and 
new partitions are there that have already aged out.  Lets add this in another 

I know you are super deep in Kafka and other should chime in if I'm way 
off-base, but I think that {{startingOffsets=earliest}} and 
{{startingOffsets=latest}} is pretty clear what is happening.  I would not 
change {{earliest}} and {{latest}} just to be different from kafka.  We could 
make it query start if this is still confusing, but lets do that soon if that 
is the case.

> More granular control of starting offsets (assign)
> --------------------------------------------------
>                 Key: SPARK-17812
>                 URL: https://issues.apache.org/jira/browse/SPARK-17812
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
> Right now you can only run a Streaming Query starting from either the 
> earliest or latests offsets available at the moment the query is started.  
> Sometimes this is a lot of data.  It would be nice to be able to do the 
> following:
>  - seek to user specified offsets for manually specified topicpartitions

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to