[
https://issues.apache.org/jira/browse/SPARK-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530344#comment-16530344
]
Karthik Palaniappan commented on SPARK-24668:
---------------------------------------------
Sounds good to me – I'll work on implementing option 1 (taking a soft lock). I
could see web interfaces like notebooks calling uiWebUrl() if they have a "View
the Spark UI" button.
> PySpark crashes when getting the webui url if the webui is disabled
> -------------------------------------------------------------------
>
> Key: SPARK-24668
> URL: https://issues.apache.org/jira/browse/SPARK-24668
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.3.0, 2.4.0
> Environment: * Spark 2.3.0
> * Spark-on-YARN
> * Java 8
> * Python 3.6.5
> * Jupyter 4.4.0
> Reporter: Karthik Palaniappan
> Priority: Minor
>
> Repro:
>
> Evaluate `sc` in a Jupyter notebook:
>
>
> {{---------------------------------------------------------------------------}}
> {{Py4JJavaError Traceback (most recent call
> last)}}
> {{/opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in
> __call__(self, obj)}}
> {{ 343 method = get_real_method(obj, self.print_method)}}
> {{ 344 if method is not None:}}
> {{--> 345 return method()}}
> {{ 346 return None}}
> {{ 347 else:}}
> {{/usr/lib/spark/python/pyspark/context.py in _repr_html_(self)}}
> {{ 261 </div>}}
> {{ 262 """.format(}}
> {{--> 263 sc=self}}
> {{ 264 )}}
> {{ 265 }}
> {{/usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)}}
> {{ 373 def uiWebUrl(self):}}
> {{ 374 """Return the URL of the SparkUI instance started by this
> SparkContext"""}}
> {{--> 375 return
> self._[jsc.sc|https://www.google.com/url?q=http://jsc.sc&sa=D&usg=AFQjCNHUwO0Cf3OHs1QafBFXzShZ_PU8IQ]().uiWebUrl().get()}}
> {{ 376 }}
> {{ 377 @property}}
> {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in
> __call__(self, *args)}}
> {{ 1158 answer = self.gateway_client.send_command(command)}}
> {{ 1159 return_value = get_return_value(}}
> {{-> 1160 answer, self.gateway_client, self.target_id,
> [self.name|https://www.google.com/url?q=http://self.name&sa=D&usg=AFQjCNEu_LlQOduOrIyV64UgIuRgm6Ea2w])}}
> {{ 1161 }}
> {{ 1162 for temp_arg in temp_args:}}
> {{/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)}}
> {{ 61 def deco(*a, **kw):}}
> {{ 62 try:}}
> {{---> 63 return f(*a, **kw)}}
> {{ 64 except py4j.protocol.Py4JJavaError as e:}}
> {{ 65 s = e.java_exception.toString()}}
> {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in
> get_return_value(answer, gateway_client, target_id, name)}}
> {{ 318 raise Py4JJavaError(}}
> {{ 319 "An error occurred while calling
> \{0}{1}\{2}.\n".}}
> {{--> 320 format(target_id, ".", name), value)}}
> {{ 321 else:}}
> {{ 322 raise Py4JError(}}
> {{Py4JJavaError: An error occurred while calling o80.get.}}
> {{: java.util.NoSuchElementException: None.get}}
> {{ at scala.None$.get(Option.scala:347)}}
> {{ at scala.None$.get(Option.scala:345)}}
> {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{ at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{ at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)}}
> {{ at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)}}
> {{ at py4j.Gateway.invoke(Gateway.java:282)}}
> {{ at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)}}
> {{ at py4j.commands.CallCommand.execute(CallCommand.java:79)}}
> {{ at py4j.GatewayConnection.run(GatewayConnection.java:214)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}
>
> PySpark only prints out the web ui url in `_repr_html`, not `__repr__`, so
> this only happens in notebooks that render html, not the pyspark shell.
> [https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d]
>
> Disabling Spark's UI with `spark.ui.enabled` *is* valuable outside of tests.
> A couple reasons that come to mind:
> 1) If you run multiple spark applications from one machine, Spark
> irritatingly starts picking the same port (4040), as the first application,
> then increments (4041, 4042, etc) until it finds an open port. If you are
> running 10 spark apps, then the 11th prints out 10 warnings about ports being
> taken until it finally finds one.
> 2) You can serve the spark web ui from a dedicated spark history server
> instead of per-driver. This is documented here, at least for Spark-on-YARN:
> [https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.]
>
> PySpark should not crash if the web ui is disabled. There are a couple of
> options:
> 1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or
> the history server url, depending on which one is being used.
> 2) PySpark should call getOrElse(None) rather than get().
>
> I strongly prefer option 1), but I can't figure out how to do it in a
> non-hacky way. In SparkContext.scala, uiWebUrl() comes from
> `_ui.map(_.webUrl)`, where `_ui` contains the actual SparkUI if
> spark.ui.enabled=true.
> 1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid
> calling `bind()` on the UI server. I'm not sure what the implications would
> be for classes outside of SparkContext that use SparkContext#ui.
> 2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the
> in-driver UI and `uiWebUrl()` returns the in-driver or history URL.
>
> I would appreciate some help figuring out how to proceed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]