Sorry for the late reply. I'm strongly supportive about the initial movement, splitting a crazy long one page of guide doc into multiple pages.
If anyone ever looks at the SS guide doc page, they would agree that anyone can't simply say "RTFM" to learn about SS or use the page as reference. The Chrome plugin "The Read Time" tells me that *the SS guide page for Spark 3.5.2 has 22655 words and takes 100 mins to read*. (They will still need to read the SS + Kafka guide page as well in many cases.) Given the characteristic of context (there is a learning curve for newcomers), it's probably 1.5x ~ 2x (it may not even be sufficient) for newcomers and some contents are conceptual vs some other contents are almost for reference, so it's definitely helpful to start splitting the page into multiple pages. It'd be ideal if we could classify the conceptual + quick start content vs reference context and place them properly so that users could have their own pace and needs, but it doesn't need to be done at once. On Fri, Aug 23, 2024 at 10:54 PM Neil Ramaswamy <n...@ramaswamy.org> wrote: > Since it's been over 72 hours with no objections, I'm going to make a PR > with this change. If you have any specific opinions, we can discuss them on > GitHub. > > Neil > > On Tue, Aug 20, 2024 at 12:11 AM Neil Ramaswamy <n...@ramaswamy.org> > wrote: > >> Hi all, >> >> A few months ago, I started a thread about migrating our programming >> guides to be versionless. I had a POC, and the mostly-positive reception on >> the thread encouraged me to implement it for real. >> >> I did that recently here >> <https://github.com/neilramaswamy/spark-website/pull/2>, but there were >> a few critical issues: some guides (like MLlib) reference code examples in >> the apache/spark repo itself, and the SQL reference directly references the >> generated API reference using a Jekyll Liquid tag called include_api_gen. I >> think these are non-starters unless there is significant community interest. >> >> One of the motivations for versionless guides was to be able to quickly >> iterate to avoid large, SEO-impacting changes. However, with the challenge >> that versionless poses, I think it's better to just break apart the large >> guides, like the Structured Streaming one, and just hope that they rank >> well in Spark 4.0.0+. >> >> To that end, I've broken apart the Structured Streaming Programming >> Guide—it now resembles the MLlib and SQL reference guides. Critically, I >> have not changed *any *content. This work should make it easier for us >> to better paginate and structure our Structured Streaming docs in the >> future, which will make it easier for our users to consume. This is >> especially important because similar tools like Flink do a much nicer job >> of organizing content. >> >> You can view the changes on my personal site here >> <https://nr-spark-site.vercel.app/streaming/index.html>, and you can see >> the code changes here <https://github.com/neilramaswamy/nr-spark/pull/6>. >> Please let me know what you think; if there's no major objection, I will >> create a ticket and submit the PR. >> >> Best, >> Neil >> >