This is an automated email from the ASF dual-hosted git repository.
gengliang pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 6abfeb6 [SPARK-33611][UI] Avoid encoding twice on the query parameter
of rewritten proxy URL
6abfeb6 is described below
commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e
Author: Gengliang Wang <[email protected]>
AuthorDate: Wed Dec 2 01:36:41 2020 +0800
[SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten
proxy URL
### What changes were proposed in this pull request?
When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server),
the request URL can be encoded twice if we pass the query string directly to
the constructor of `java.net.URI`:
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the
reverse proxy
> val rewrittenURI = URI.create(uri.toString())
> new URI(rewrittenURI.getScheme(),
rewrittenURI.getAuthority(),
rewrittenURI.getPath(),
query,
rewrittenURI.getFragment()).toString
result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
```
In Spark's stage page, the URL of "/taskTable" contains query parameter
order[0][dir]. After encoding twice, the query parameter becomes
`order%255B0%255D%255Bdir%255D` and it will be decoded as
`order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be
NullPointerException from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
Other than that, the other parameter may not work as expected after encoded
twice.
This PR is to fix the bug by calling the method `URI.create(String URL)`
directly. This convenience method can avoid encoding twice on the query
parameter.
```
> val uri = "http://localhost:8081/test"
> val query = "order%5B0%5D%5Bcolumn%5D=0"
> URI.create(s"$uri?$query").toString
result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0
> URI.create(s"$uri?$query").getQuery
result: order[0][column]=0
```
### Why are the changes needed?
Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to https://github.com/apache/spark/pull/29271.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
listen 9000;
set $SPARK_MASTER http://127.0.0.1:8080;
# split spark UI path into prefix and local path within master UI
location ~ ^(/path/to/spark/) {
# strip prefix when forwarding request
rewrite /path/to/spark(/.*) $1 break;
#rewrite /path/to/spark/ "/" ;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
proxy_intercept_errors on;
error_page 301 302 307 = handle_redirects;
}
location handle_redirects {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
```
Closes #30552 from gengliangwang/decodeProxyRedirect.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6)
Signed-off-by: Gengliang Wang <[email protected]>
---
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++++++----------
core/src/test/scala/org/apache/spark/ui/UISuite.scala | 9 +++++++++
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index a4ba565..3820a88 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging {
uri.append(rest)
}
- val rewrittenURI = URI.create(uri.toString())
- if (query != null) {
- return new URI(
- rewrittenURI.getScheme(),
- rewrittenURI.getAuthority(),
- rewrittenURI.getPath(),
- query,
- rewrittenURI.getFragment()
- ).normalize()
+ val queryString = if (query == null) {
+ ""
+ } else {
+ s"?$query"
}
- rewrittenURI.normalize()
+ // SPARK-33611: use method `URI.create` to avoid percent-encoding twice on
the query string.
+ URI.create(uri.toString() + queryString).normalize()
}
def createProxyLocationHeader(
diff --git a/core/src/test/scala/org/apache/spark/ui/UISuite.scala
b/core/src/test/scala/org/apache/spark/ui/UISuite.scala
index 56026ea..c7e1dfe 100644
--- a/core/src/test/scala/org/apache/spark/ui/UISuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/UISuite.scala
@@ -216,6 +216,15 @@ class UISuite extends SparkFunSuite {
assert(rewrittenURI === null)
}
+ test("SPARK-33611: Avoid encoding twice on the query parameter of proxy
rewrittenURI") {
+ val prefix = "/worker-id"
+ val target = "http://localhost:8081"
+ val path = "/worker-id/json"
+ val rewrittenURI =
+ JettyUtils.createProxyURI(prefix, target, path,
"order%5B0%5D%5Bcolumn%5D=0")
+ assert(rewrittenURI.toString ===
"http://localhost:8081/json?order%5B0%5D%5Bcolumn%5D=0")
+ }
+
test("verify rewriting location header for reverse proxy") {
val clientRequest = mock(classOf[HttpServletRequest])
var headerValue = "http://localhost:4040/jobs"
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]