Awesome, I'm glad that others think this is a good idea. I'd love a review
of the initial splitting apart PR
<https://github.com/apache/spark/pull/47864> if you have a chance.

And I have plans to break up the guide and refine the content even further;
my personal Structured Streaming docs site
<https://structured-streaming.vercel.app/> is what I'd like our official
docs to head towards. There are operator-specific pages, dedicated pages
for configs like output mode/triggers, e2e examples, etc.

Neil




On Sun, Aug 25, 2024 at 7:27 PM Anish Shrigondekar <
anish.shrigonde...@databricks.com> wrote:

> +1 - I think it would be super helpful to split this single long doc.
>
> Couple of points that might be useful:
> - If we could create an individual page per operator with some concrete
> examples, that would be great
> - As Jungtaek mentioned, separating the API reference from the conceptual
> portions might also be helpful
> - If we could keep allied streaming docs (state data source reader, kafka
> connector etc) referenced easily and also following a similar pattern, that
> would also be great
>
> As Jungtaek mentioned, this doesn't have to be as part of a single change.
> We can split these changes over time.
>
> Thanks,
> Anish
>
>
>
>
> On Sun, Aug 25, 2024 at 6:17 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Sorry for the late reply. I'm strongly supportive about the initial
>> movement, splitting a crazy long one page of guide doc into multiple pages.
>>
>> If anyone ever looks at the SS guide doc page, they would agree that
>> anyone can't simply say "RTFM" to learn about SS or use the page as
>> reference. The Chrome plugin "The Read Time" tells me that *the SS guide
>> page for Spark 3.5.2 has 22655 words and takes 100 mins to read*. (They
>> will still need to read the SS + Kafka guide page as well in many cases.)
>>
>> Given the characteristic of context (there is a learning curve for
>> newcomers), it's probably 1.5x ~ 2x (it may not even be sufficient) for
>> newcomers and some contents are conceptual vs some other contents are
>> almost for reference, so it's definitely helpful to start splitting the
>> page into multiple pages. It'd be ideal if we could classify the
>> conceptual + quick start content vs reference context and place them
>> properly so that users could have their own pace and needs, but it doesn't
>> need to be done at once.
>>
>> On Fri, Aug 23, 2024 at 10:54 PM Neil Ramaswamy <n...@ramaswamy.org>
>> wrote:
>>
>>> Since it's been over 72 hours with no objections, I'm going to make a PR
>>> with this change. If you have any specific opinions, we can discuss them on
>>> GitHub.
>>>
>>> Neil
>>>
>>> On Tue, Aug 20, 2024 at 12:11 AM Neil Ramaswamy <n...@ramaswamy.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> A few months ago, I started a thread about migrating our programming
>>>> guides to be versionless. I had a POC, and the mostly-positive reception on
>>>> the thread encouraged me to implement it for real.
>>>>
>>>> I did that recently here
>>>> <https://github.com/neilramaswamy/spark-website/pull/2>, but there
>>>> were a few critical issues: some guides (like MLlib) reference code
>>>> examples in the apache/spark repo itself, and the SQL reference directly
>>>> references the generated API reference using a Jekyll Liquid tag called
>>>> include_api_gen. I think these are non-starters unless there is significant
>>>> community interest.
>>>>
>>>> One of the motivations for versionless guides was to be able to quickly
>>>> iterate to avoid large, SEO-impacting changes. However, with the challenge
>>>> that versionless poses, I think it's better to just break apart the large
>>>> guides, like the Structured Streaming one, and just hope that they rank
>>>> well in Spark 4.0.0+.
>>>>
>>>> To that end, I've broken apart the Structured Streaming Programming
>>>> Guide—it now resembles the MLlib and SQL reference guides. Critically, I
>>>> have not changed *any *content. This work should make it easier for us
>>>> to better paginate and structure our Structured Streaming docs in the
>>>> future, which will make it easier for our users to consume. This is
>>>> especially important because similar tools like Flink do a much nicer job
>>>> of organizing content.
>>>>
>>>> You can view the changes on my personal site here
>>>> <https://nr-spark-site.vercel.app/streaming/index.html>, and you can
>>>> see the code changes here
>>>> <https://github.com/neilramaswamy/nr-spark/pull/6>. Please let me know
>>>> what you think; if there's no major objection, I will create a ticket and
>>>> submit the PR.
>>>>
>>>> Best,
>>>> Neil
>>>>
>>>

Reply via email to