(spark-website) branch asf-site updated: Improve document for IDE support and simplify doc build (#654)

yao Thu, 25 Dec 2025 20:09:44 -0800

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 90c156eba3 Improve document for IDE support and simplify doc build 
(#654)
90c156eba3 is described below

commit 90c156eba3c2b5927c8ff40b32b1a9e668391d39
Author: Wenchen Fan <[email protected]>
AuthorDate: Fri Dec 26 12:09:28 2025 +0800

    Improve document for IDE support and simplify doc build (#654)
    
    * Update developer-tools.md
    
    * Update README.md
    
    * Create run-in-container.sh
    
    * Update developer-tools.html
    
    * fix
    
    * better
    
    * Update README.md
    
    * improve
    
    * Apply suggestions from code review
    
    * final
    
    * address comments
---
 .dev/build-docs.sh        | 24 ++++++++++++++++++++++++
 .dev/run-in-container.sh  | 35 +++++++++++++++++++++++++++++++++++
 README.md                 | 42 +++++++++++++++++-------------------------
 developer-tools.md        | 21 ++++++++++++++++++---
 site/developer-tools.html | 23 ++++++++++++++++++++---
 site/sitemap.xml          | 14 +++++++-------
 6 files changed, 121 insertions(+), 38 deletions(-)

diff --git a/.dev/build-docs.sh b/.dev/build-docs.sh
new file mode 100644
index 0000000000..c4297b4177
--- /dev/null
+++ b/.dev/build-docs.sh
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+docker run \
+  -e HOST_UID=$(id -u) \
+  -e HOST_GID=$(id -g) \
+  --mount type=bind,source="$PWD",target="/spark-website" \
+  -w /spark-website \
+  docs-builder:latest \
+  /bin/bash -c "sh .dev/run-in-container.sh"
diff --git a/.dev/run-in-container.sh b/.dev/run-in-container.sh
new file mode 100644
index 0000000000..1ba306d629
--- /dev/null
+++ b/.dev/run-in-container.sh
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# 1.Set env variable.
+export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64
+export PATH=$JAVA_HOME/bin:$PATH
+
+# 2.Install bundler.
+gem install bundler -v 2.4.22
+bundle install
+
+# 3. Create a user matching the host UID/GID
+groupadd -g $HOST_GID docuser
+useradd -u $HOST_UID -g $HOST_GID -m docuser
+
+# We need this link to make sure `python3` points to `python3.11` which 
contains the prerequisite packages.
+ln -s "$(which python3.11)" "/usr/local/bin/python3"
+
+# Build docs
+rm -rf .jekyll-cache
+su docuser -c "bundle exec jekyll build"
diff --git a/README.md b/README.md
index 7d051f074a..2e3e003dc6 100644
--- a/README.md
+++ b/README.md
@@ -3,31 +3,23 @@
 In this directory you will find text files formatted using Markdown, with an 
`.md` suffix.
 
 Building the site requires [Ruby 3](https://www.ruby-lang.org), 
[Jekyll](http://jekyllrb.com/docs), and
-[Rouge](https://github.com/rouge-ruby/rouge).
-The easiest way to install the right version of these tools is using
-[Bundler](https://bundler.io/) and running `bundle install` in this directory.
-
-See also 
[https://github.com/apache/spark/blob/master/docs/README.md](https://github.com/apache/spark/blob/master/docs/README.md)
-
-A site build will update the directories and files in the `site` directory 
with the generated files.
-Using Jekyll via `bundle exec jekyll` locks it to the right version.
-So after this you can generate the html website by running `bundle exec jekyll 
build` in this
-directory. Use the `--watch` flag to have jekyll recompile your files as you 
save changes.
-
-In addition to generating the site as HTML from the Markdown files, jekyll can 
serve the site via
-a web server. To build the site and run a web server use the command `bundle 
exec jekyll serve` which runs
-the web server on port 4000, then visit the site at http://localhost:4000.
-
-Please make sure you always run `bundle exec jekyll build` after testing your 
changes with
-`bundle exec jekyll serve`, otherwise you end up with broken links in a few 
places.
-
-## Updating Jekyll version
-
-To update `Jekyll` or any other gem please follow these steps:
-
-1. Update the version in the `Gemfile`
-1. Run `bundle update` which updates the `Gemfile.lock`
-1. Commit both files
+[Rouge](https://github.com/rouge-ruby/rouge). The most reliable way to ensure 
a compatible environment
+is to use the official Docker build image from the Apache Spark repository.
+
+If you haven't already, clone the [Apache 
Spark](https://github.com/apache/spark) repository. Navigate to
+the Spark root directory and run the following command to create the builder 
image:
+```
+docker build \
+  --tag docs-builder:latest \
+  --file dev/spark-test-image/docs/Dockerfile \
+  dev/spark-test-image-util/docs/
+```
+
+Once the image is built, navigate to the `spark-website` root directory, run 
the script which processes
+the Markdown files in the Docker container.
+```
+SPARK_WEBSITE_PATH="/path/to/spark-website" sh .dev/build-docs.sh
+```
 
 ## Docs sub-dir
 
diff --git a/developer-tools.md b/developer-tools.md
index bce821d8c6..0908cef343 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -352,17 +352,32 @@ By default, this script will format files that differ 
from git master. For more
 
 <h3>IDE setup</h3>
 
+Make sure you have a clean start before setting up the IDE: A clean git clone 
of the Spark repo, install the latest 
+version of the IDE.
+
+If something goes wrong, clear the build outputs by `./build/sbt clean` and 
`./build/mvn clean`, clear the m2 
+cache by `rm -rf ~/.m2/repository/*`, re-import the project into the IDE 
cleanly and try again.
+
 <h4>IntelliJ</h4>
 
 While many of the Spark developers use SBT or Maven on the command line, the 
most common IDE we 
-use is IntelliJ IDEA. You can get the community edition for free (Apache 
committers can get 
-free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala 
plugin from `Preferences > Plugins`.
+use is IntelliJ IDEA. You need to install the JetBrains Scala plugin from 
`Preferences > Plugins`.
+
+Due to the complexity of Spark build, please modify the following global 
settings of IntelliJ IDEA:
+
+- Go to `Settings -> Build, Execution, Deployment -> Build Tools -> Maven -> 
Importing`, make sure you 
+choose "Detect automatically" for `Generated source folders`, and choose 
"generate sources" for 
+`Phase to be used for folders update`.
+- Go to `Settings -> Build, Execution, Deployment -> Compiler -> Scala 
Compiler -> Scala Compiler Server`, 
+pick a large enough number for `Maximum heap size, MB`, such as "4000".
 
 To create a Spark project for IntelliJ:
 
 - Download IntelliJ and install the 
 <a 
href="https://confluence.jetbrains.com/display/SCA/Scala+Plugin+for+IntelliJ+IDEA";>Scala
 plug-in for IntelliJ</a>.
-- Go to `File -> Import Project`, locate the spark source directory, and 
select "Maven Project".
+- Go to `File -> Import Project`, locate the spark source directory, and 
select "Maven Project". It's important to
+pick Maven instead of sbt here, as Spark has complicated building logic that 
is implemented for sbt using Scala code
+in `SparkBuilder.scala`, and IntelliJ IDEA cannot understant it well.
 - In the Import wizard, it's fine to leave settings at their default. However 
it is usually useful 
 to enable "Import Maven projects automatically", since changes to the project 
structure will 
 automatically update the IntelliJ project.
diff --git a/site/developer-tools.html b/site/developer-tools.html
index fa874b50f3..574ef1f201 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -481,18 +481,35 @@ your code.  It can be configured to match the import 
ordering from the style gui
 
 <h3>IDE setup</h3>
 
+<p>Make sure you have a clean start before setting up the IDE: A clean git 
clone of the Spark repo, install the latest 
+version of the IDE.</p>
+
+<p>If something goes wrong, clear the build outputs by <code 
class="language-plaintext highlighter-rouge">./build/sbt clean</code> and <code 
class="language-plaintext highlighter-rouge">./build/mvn clean</code>, clear 
the m2 
+cache by <code class="language-plaintext highlighter-rouge">rm -rf 
~/.m2/repository/*</code>, re-import the project into the IDE cleanly and try 
again.</p>
+
 <h4>IntelliJ</h4>
 
 <p>While many of the Spark developers use SBT or Maven on the command line, 
the most common IDE we 
-use is IntelliJ IDEA. You can get the community edition for free (Apache 
committers can get 
-free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala 
plugin from <code class="language-plaintext highlighter-rouge">Preferences &gt; 
Plugins</code>.</p>
+use is IntelliJ IDEA. You need to install the JetBrains Scala plugin from 
<code class="language-plaintext highlighter-rouge">Preferences &gt; 
Plugins</code>.</p>
+
+<p>Due to the complexity of Spark build, please modify the following global 
settings of IntelliJ IDEA:</p>
+
+<ul>
+  <li>Go to <code class="language-plaintext highlighter-rouge">Settings -&gt; 
Build, Execution, Deployment -&gt; Build Tools -&gt; Maven -&gt; 
Importing</code>, make sure you 
+choose &#8220;Detect automatically&#8221; for <code class="language-plaintext 
highlighter-rouge">Generated source folders</code>, and choose &#8220;generate 
sources&#8221; for 
+<code class="language-plaintext highlighter-rouge">Phase to be used for 
folders update</code>.</li>
+  <li>Go to <code class="language-plaintext highlighter-rouge">Settings -&gt; 
Build, Execution, Deployment -&gt; Compiler -&gt; Scala Compiler -&gt; Scala 
Compiler Server</code>, 
+pick a large enough number for <code class="language-plaintext 
highlighter-rouge">Maximum heap size, MB</code>, such as 
&#8220;4000&#8221;.</li>
+</ul>
 
 <p>To create a Spark project for IntelliJ:</p>
 
 <ul>
   <li>Download IntelliJ and install the 
 <a 
href="https://confluence.jetbrains.com/display/SCA/Scala+Plugin+for+IntelliJ+IDEA";>Scala
 plug-in for IntelliJ</a>.</li>
-  <li>Go to <code class="language-plaintext highlighter-rouge">File -&gt; 
Import Project</code>, locate the spark source directory, and select 
&#8220;Maven Project&#8221;.</li>
+  <li>Go to <code class="language-plaintext highlighter-rouge">File -&gt; 
Import Project</code>, locate the spark source directory, and select 
&#8220;Maven Project&#8221;. It&#8217;s important to
+pick Maven instead of sbt here, as Spark has complicated building logic that 
is implemented for sbt using Scala code
+in <code class="language-plaintext 
highlighter-rouge">SparkBuilder.scala</code>, and IntelliJ IDEA cannot 
understant it well.</li>
   <li>In the Import wizard, it&#8217;s fine to leave settings at their 
default. However it is usually useful 
 to enable &#8220;Import Maven projects automatically&#8221;, since changes to 
the project structure will 
 automatically update the IntelliJ project.</li>
diff --git a/site/sitemap.xml b/site/sitemap.xml
index fd71401fa2..e1272626df 100644
--- a/site/sitemap.xml
+++ b/site/sitemap.xml
@@ -1153,23 +1153,23 @@
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/streaming/</loc>
+  <loc>https://spark.apache.org/spark-connect/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/sql/</loc>
+  <loc>https://spark.apache.org/pandas-on-spark/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/mllib/</loc>
+  <loc>https://spark.apache.org/graphx/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/graphx/</loc>
+  <loc>https://spark.apache.org/mllib/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/screencasts/</loc>
+  <loc>https://spark.apache.org/streaming/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
@@ -1177,11 +1177,11 @@
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/pandas-on-spark/</loc>
+  <loc>https://spark.apache.org/screencasts/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>
-  <loc>https://spark.apache.org/spark-connect/</loc>
+  <loc>https://spark.apache.org/sql/</loc>
   <changefreq>weekly</changefreq>
 </url>
 <url>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark-website) branch asf-site updated: Improve document for IDE support and simplify doc build (#654)

Reply via email to