[ 
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560749#comment-15560749
 ] 

Ofir Manor commented on SPARK-17815:
------------------------------------

Thanks Cody, this is much clearer.
(BTW - I've been bitten multiple times by HDFS corrupting files, especially 
with truncate() API, but that is a different story)
I think we are mixing two different discussions here.
Structured Streaming provides a framework and an algorithm, and expects all 
sources and sinks to align with that. The Kafka source is just one such example 
(and the Kafka sink discussion is about other limits of the current framework).
You have some concerns and reservation regarding the framework - both due to 
partial implementation so far and due to deeper concerns (mostly complexity and 
its likely effects).
I think the umbrella discussion (Structured Streaming Kafka source) is about 
conforming to the spec. This specific ticket is about an even smaller detail.
Of course, given that so far there were no real opportunities for the deeper, 
architectural discussion (or maybe it is just my perception), it might make 
sense to use every opportunity to try to raise and effect the higher-level 
issues. But I think at least we should be clear if we discuss something 
specific to the Kafka source for Structured Streaming, or things at the 
framework level. 
(Your SIP suggestion in the mailing list - if I understand correctly - is 
exactly about enabling that kind of discussion, right?)
Just my two cents.

> Report committed offsets
> ------------------------
>
>                 Key: SPARK-17815
>                 URL: https://issues.apache.org/jira/browse/SPARK-17815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit.  However, 
> this means that external tools are not able to report on how far behind a 
> given streaming job is.  When the user manually gives us a group.id, we 
> should report back to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to