Fred, I think thats a pretty good summary of my thoughts. Thanks for condensing them :)
Right now, my focus is to get more people using Structured Streaming so that we can get some real world feedback on what is missing. Right now this means: - SPARK-15406 <https://issues.apache.org/jira/browse/SPARK-15406> Kafka Support - since this seems to be the source of choice for many users - SPARK-17731 <https://issues.apache.org/jira/browse/SPARK-17731> Metrics - right now its pretty hard to see what is going on, and where latency is coming from. Once those are in and see some use, I think it'll be easier to prioritize the work on #1. Relatedly, I'm curious to hear more about the types of questions you are getting. I think the dev list is a good place to discuss applications and if/how structured streaming can handle them. On Wed, Oct 5, 2016 at 3:20 PM, Fred Reiss <freiss....@gmail.com> wrote: > Thanks for the thoughtful comments, Michael and Shivaram. From what I’ve > seen in this thread and on JIRA, it looks like the current plan with regard > to application-facing APIs for sinks is roughly: > 1. Rewrite incremental query compilation for Structured Streaming. > 2. Redesign Structured Streaming's source and sink APIs so that they do > not depend on RDDs. > 3. Allow the new APIs to stabilize. > 4. Open these APIs to use by application code. > > Is there a way for those of us who aren’t involved in the first two steps > to get some idea of the current plans and progress? I get asked a lot about > when Structured Streaming will be a viable replacement for Spark Streaming, > and I like to be able to give accurate advice. > > Fred > > On Tue, Oct 4, 2016 at 3:02 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> I don't quite understand why exposing it indirectly through a typed >>> interface should be delayed before finalizing the API. >>> >> >> Spark has a long history >> <https://spark-project.atlassian.net/browse/SPARK-1094> of maintaining >> binary compatibility in its public APIs. I strongly believe this is one of >> the things that has made the project successful. Exposing internals that >> we know are going to change in the primary user facing API for creating >> Streaming DataFrames seems directly counter to this goal. I think the >> argument that "you can do it anyway" fails to capture user expectations who >> probably aren't closely following this discussion. >> >> If advanced users want to dig though the code and experiment, great. I >> hope they report back on whats good and what can be improved. However, if >> you add the function suggested in the PR to DataStreamReader, you are >> giving them a bad experience by leaking internals that don't even show up >> in the published documentation. >> > >