Re: [OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

Xingbo Jiang Tue, 21 Jul 2020 11:24:25 -0700

Hi Holden,

This is the digest for commits merged between *June 3 and June 16.* The
commits you mentioned would be included in the future digests.


Cheers,

Xingbo

On Tue, Jul 21, 2020 at 11:13 AM Holden Karau <hol...@pigscanfly.ca> wrote:

> I'd also add [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are
> being shutdown &
>
> [SPARK-21040][CORE] Speculate tasks which are running on decommission
> executors two of the PRs merged after the decommissioning SPIP.
>
> On Tue, Jul 21, 2020 at 10:53 AM Xingbo Jiang <jiangxb1...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> This is the bi-weekly Apache Spark digest from the Databricks OSS team.
>> For each API/configuration/behavior change, an *[API] *tag is added in
>> the title.
>>
>> CORE
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31923core-ignore-internal-accumulators-that-use-unrecognized-types-rather-than-crashing-63--5>[3.0][SPARK-31923][CORE]
>> Ignore internal accumulators that use unrecognized types rather than
>> crashing (+63, -5)>
>> <https://github.com/apache/spark/commit/b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4>
>>
>> A user may name his accumulators using the internal.metrics. prefix, so
>> that Spark treats them as internal accumulators and hides them from UI. We
>> should make JsonProtocol.accumValueToJson more robust and let it ignore
>> internal accumulators that use unrecognized types.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api80spark-31486core-sparksubmitwaitappcompletion-flag-to-control-spark-submit-exit-in-standalone-cluster-mode-88--26>[API][3.1][SPARK-31486][CORE]
>> spark.submit.waitAppCompletion flag to control spark-submit exit in
>> Standalone Cluster Mode (+88, -26)>
>> <https://github.com/apache/spark/commit/6befb2d8bdc5743d0333f4839cf301af165582ce>
>>
>> This PR implements an application wait mechanism that allows spark-submit to
>> wait until the application finishes in Standalone mode. This will delay the
>> exit of spark-submit JVM until the job is completed. This implementation
>> will keep monitoring the application until it is either finished, failed,
>> or killed. This will be controlled via the following conf:
>>
>>    -
>>
>>    spark.standalone.submit.waitAppCompletion (Default: false)
>>
>>    In standalone cluster mode, controls whether the client waits to exit
>>    until the application completes. If set to true, the client process
>>    will stay alive polling the driver's status. Otherwise, the client process
>>    will exit after submission.
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#sql>
>> SQL
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#71spark-31220sql-repartition-obeys-initialpartitionnum-when-adaptiveexecutionenabled-27--12>[3.0][SPARK-31220][SQL]
>> repartition obeys initialPartitionNum when adaptiveExecutionEnabled (+27,
>> -12)>
>> <https://github.com/apache/spark/commit/1d1eacde9d1b6fb75a20e4b909d221e70ad737db>
>>
>> AQE and non-AQE use different configs to set the initial shuffle
>> partition number. This PR fixes repartition/DISTRIBUTE BY so that it
>> also uses the AQE config
>> spark.sql.adaptive.coalescePartitions.initialPartitionNum to set the
>> initial shuffle partition number if AQE is enabled.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31867sqlfollowup-check-result-differences-for-datetime-formatting-51--8>[3.0][SPARK-31867][SQL][FOLLOWUP]
>> Check result differences for datetime formatting (+51, -8)>
>> <https://github.com/apache/spark/commit/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1>
>>
>> Spark should throw SparkUpgradeException when getting DateTimeException for
>> datetime formatting in the EXCEPTION legacy Time Parser Policy.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api70spark-31879spark-31892sql-disable-week-based-pattern-letters-in-datetime-parsingformatting-1421--171-102--48>[API][3.0][SPARK-31879][SPARK-31892][SQL]
>> Disable week-based pattern letters in datetime parsing/formatting (+1421,
>> -171)>
>> <https://github.com/apache/spark/commit/9d5b5d0a5849ac329bbae26d9884d8843d8a8571>
>>  (+102,
>> -48)>
>> <https://github.com/apache/spark/commit/afe95bd9ad7a07c49deecf05f0a1000bb8f80caa>
>>
>> Week-based pattern letters have very weird behaviors during datetime
>> parsing in Spark 2.4, and it's very hard to simulate the legacy behaviors
>> with the new API. For formatting, the new API makes the start-of-week
>> localized, and it's not possible to keep the legacy behaviors. Since the
>> week-based fields are rarely used, we disable week-based pattern letters in
>> both parsing and formatting.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31896sql-handle-am-pm-timestamp-parsing-when-hour-is-missing-39--3>[3.0][SPARK-31896][SQL]
>> Handle am-pm timestamp parsing when hour is missing (+39, -3)>
>> <https://github.com/apache/spark/commit/afcc14c6d27f9e0bd113e0d86b64dc6fa4eed551>
>>
>> This PR sets the hour field to 0 or 12 when the AMPM_OF_DAY field is AM
>> or PM during datetime parsing, to keep the behavior the same as Spark 2.4.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api80spark-31830sql-consistent-error-handling-for-datetime-formatting-and-parsing-functions-126--580>[API][3.1][SPARK-31830][SQL]
>> Consistent error handling for datetime formatting and parsing functions
>> (+126, -580)>
>> <https://github.com/apache/spark/commit/6a424b93e5bdb79b1f1310cf48bd034397779e14>
>>
>> When parsing/formatting datetime values, it's better to fail fast if the
>> pattern string is invalid, instead of returning null for each input record.
>> The formatting functions such as date_format already do it, this PR
>> applies the fail-fast behavior to parsing functions: from_unixtime,
>> unix_timestamp,to_unix_timestamp, to_timestamp and to_date.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#80spark-31910sql-enable-java-8-time-api-in-thrift-server-23--0>[3.1][SPARK-31910][SQL]
>> Enable Java 8 time API in Thrift server (+23, -0)>
>> <https://github.com/apache/spark/commit/2c9988eaf31b7ebd97f2c2904ed7ee531eff0d20>
>>
>> This PR enables Java 8 time API in thriftserver, so that we use the
>> session timezone more consistently.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#55spark-31935sql-hadoop-file-system-config-should-be-effective-in-data-source-options-52--7>[2.4][SPARK-31935][SQL]
>> Hadoop file system config should be effective in data source options (+52,
>> -7)>
>> <https://github.com/apache/spark/commit/f3771c6b47d0b3aef10b86586289a1f675c7cfe2>
>>
>> This PR fixes a bug that the hadoop configs in read/write options are not
>> respected in data source V1.
>> [API][2.4][SPARK-31968][SQL] Duplicate partition columns check when
>> writing data (+12, -1)>
>> <https://github.com/apache/spark/commit/a4ea599b1b9b8ebaae0100b54e6ac1d7576c6d8c>
>>
>> Add a check for duplicate partition columns when writing built-in file
>> sources. After the change, when the DataFrame has duplicate partition
>> columns, the users get an AnalysisException when writing it. Previously,
>> the writing would succeed, but reading the files with duplicate columns
>> will fail.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api71spark-26905sql-add-type-in-the-ansi-non-reserved-list-2--0>[API][3.0][SPARK-26905][SQL]
>> Add TYPE in the ANSI non-reserved list (+2, -0)>
>> <https://github.com/apache/spark/commit/e14029b18df10db5094f8abf8b9874dbc9186b4e>
>>
>> Add TYPE in the ANSI non-reserved list to follow the ANSI/SQL standard.
>> The change impacts the behavior only when ANSI mode is on
>> (spark.sql.ansi.enabled=true)
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api71spark-26905sql-follow-the-sql2016-reserved-keywords-429--5>[API][3.0][SPARK-26905][SQL]
>> Follow the SQL:2016 reserved keywords (+429, -5)>
>> <https://github.com/apache/spark/commit/3698a14204dd861ea3ee3c14aa923123b52caba1>
>>
>> Move keywords ANTI, SEMI, and MINUS from reserved to non-reserved to
>> comply with the ANSI/SQL standard. The change impacts the behavior only
>> when ANSI mode is on (spark.sql.ansi.enabled=true)
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api70spark-31939sqltest-java11-fix-parsing-day-of-year-when-year-field-pattern-is-missing-465--3>[API][3.0][SPARK-31939][SQL][TEST-JAVA11]
>> Fix Parsing day of year when year field pattern is missing (+465, -3)>
>> <https://github.com/apache/spark/commit/22dda6e18e91c6db6fa8ff9fafaafe09a79db4ea>
>>
>> When a datetime pattern does not contain a year field (ie. 'yyyy') but
>> contains the day of year field (ie. 'DD'), Spark should still be able to
>> respect the datetime pattern and parse the constants.
>>
>> Before the change:
>>
>> spark-sql> select to_timestamp('31', 'DD');
>> 1970-01-01 00:00:00
>> spark-sql> select to_timestamp('31 30', 'DD dd');
>> 1970-01-30 00:00:00
>>
>> After the change:
>>
>> spark-sql> select to_timestamp('31', 'DD');
>> 1970-01-31 00:00:00
>> spark-sql> select to_timestamp('31 30', 'DD dd');
>> NULL
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31956sql-do-not-fail-if-there-is-no-ambiguous-self-join-7--2>[3.0][SPARK-31956][SQL]
>> Do not fail if there is no ambiguous self join (+7, -2)>
>> <https://github.com/apache/spark/commit/c40051932290db3a63f80324900a116019b1e589>
>>
>> df("col").as("name") is not a column reference anymore, and should not
>> have the special column metadata that is used to identify the root
>> attribute (e.g., Dataset ID and col position). This PR fixes the
>> corresponding regression that could cause a DataFrame could fail even when
>> there is no ambiguous self-join. Below is an example,
>>
>> val joined = df.join(spark.range(1)).select($"a")
>> joined.select(joined("a").alias("x"), sum(joined("a")).over(w))
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31958sql-normalize-special-floating-numbers-in-subquery-18--4>[3.0][SPARK-31958][SQL]
>> normalize special floating numbers in subquery (+18, -4)>
>> <https://github.com/apache/spark/commit/6fb9c80da129d0b43f9ff5b8be6ce8bad992a4ed>
>>
>> The PR fixes a bug that special floating numbers in non-correlated
>> subquery expressions are not handled, now the subquery expressions will be
>> handled by OptimizeSubqueries.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api80spark-21117sql-built-in-sql-function-support---width_bucket-431--30>[API][3.1][SPARK-21117][SQL]
>> Built-in SQL Function Support - WIDTH_BUCKET (+431, -30)>
>> <https://github.com/apache/spark/commit/b1adc3deee00058cba669534aee156dc7af243dc>
>>
>> Add a built-in SQL function WIDTH_BUCKET, that returns the bucket number
>> to which value would be assigned in an equiwidth histogram with
>> num_bucket buckets, in the range min_value to max_value. Examples:
>>
>> > SELECT WIDTH_BUCKET(5.3, 0.2, 10.6, 5);
>> 3
>> > SELECT WIDTH_BUCKET(-2.1, 1.3, 3.4, 3);
>> 0
>> > SELECT WIDTH_BUCKET(8.1, 0.0, 5.7, 4);
>> 5
>> > SELECT WIDTH_BUCKET(-0.9, 5.2, 0.5, 2);
>> 3
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-27217sql-nested-column-aliasing-for-more-operators-which-can-prune-nested-column-190--10>[3.1][SPARK-27217][SQL]
>> Nested column aliasing for more operators which can prune nested column
>> (+190, -10)>
>> <https://github.com/apache/spark/commit/43063e2db2bf7469f985f1954d8615b95cf5c578>
>>
>> Support nested column pruning from an Aggregate or Expand operator.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-27633sql-remove-redundant-aliases-in-nestedcolumnaliasing-43--1>[3.1][SPARK-27633][SQL]
>> Remove redundant aliases in NestedColumnAliasing (+43, -1)>
>> <https://github.com/apache/spark/commit/8282bbf12d4e174986a649023ce3984aae7d7755>
>>
>> Avoid generating redundant aliases if the parent nested field is aliased
>> in the NestedColumnAliasing rule. This slightly improves the performance.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31736sql-nested-column-aliasing-for-repartitionbyexpressionjoin-197--16>[3.1][SPARK-31736][SQL]
>> Nested column aliasing for RepartitionByExpression/Join (+197, -16)>
>> <https://github.com/apache/spark/commit/ff89b1114319e783eb4f4187bf2583e5e21c64e4>
>>
>> Support nested column pruning from a RepartitionByExpression or Join
>> operator.
>> ML
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31925ml-summarytotaliterations-greater-than-maxiters-43--12>[3.1][SPARK-31925][ML]
>> Summary.totalIterations greater than maxIters (+43, -12)>
>> <https://github.com/apache/spark/commit/f83cb3cbb3ce3f22fd122bce620917bfd0699ce7>
>>
>> The PR fixes a correctness issue in LogisticRegression and
>> LinearRegression, that the actual round of training iterations was larger
>> by 1 than the specified maxIter.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31944-add-instance-weight-support-in-linearregressionsummary-56--24>[3.1][SPARK-31944]
>> Add instance weight support in LinearRegressionSummary (+56, -24)>
>> <https://github.com/apache/spark/commit/89c98a4c7068734e322d335cb7c9f22379ff00e8>
>>
>> The PR adds instance weight support in LinearRegressionSummary, instance
>> weight is already supported by LinearRegression and RegressionMetrics.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#ss>
>> SS
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#71spark-31593ss-remove-unnecessary-streaming-query-progress-update-58--7>[3.0][SPARK-31593][SS]
>> Remove unnecessary streaming query progress update (+58, -7)>
>> <https://github.com/apache/spark/commit/1e40bccf447dccad9d31bccc75d21b8fca77ba52>
>>
>> The PR fixes a bug that sets incorrect metrics in Structured Streaming.
>> We should make a progress update every 10 seconds when a stream doesn't
>> have any new data upstream. Without the fix, we zero out the input
>> information but not the output information when making the progress update.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31990ss-use-tosettoseq-in-datasetdropduplicates-3--1>[3.0][SPARK-31990][SS]
>> Use toSet.toSeq in Dataset.dropDuplicates (+3, -1)>
>> <https://github.com/apache/spark/commit/7f7b4dd5199e7c185aedf51fccc400c7072bed05>
>>
>> The PR proposes to preserve the input order of colNames for groupCols in
>> Dataset.dropDuplicates, because the Streaming's state store depends on
>> the groupCols order.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-24634ss-add-a-new-metric-regarding-number-of-inputs-later-than-watermark-plus-allowed-delay-94--29>[3.1][SPARK-24634][SS]
>> Add a new metric regarding number of inputs later than watermark plus
>> allowed delay (+94, -29)>
>> <https://github.com/apache/spark/commit/84815d05503460d58b85be52421d5923474aa08b>
>>
>> Add a new metrics numLateInputs to count the number of inputs which are
>> later than watermark ('inputs' are relative to operators). The new metrics
>> will be provided both on the SparkUI - SQL Tab - query execution details
>> page, and on the Streaming Query Listener.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#python>
>> PYTHON
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api70spark-31895pythonsql-support-dataframeexplainextended-str-case-to-be-consistent-with-scala-side-24--11>[API][3.0][SPARK-31895][PYTHON][SQL]
>> Support DataFrame.explain(extended: str) case to be consistent with Scala
>> side (+24, -11)>
>> <https://github.com/apache/spark/commit/e1d52011401c1989f26b230eb8c82adc63e147e7>
>>
>> Improves DataFrame.explain in PySpark, so that it takes the explain mode
>> string as well, which is consistent with the Scala API.
>> [3.0][SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per
>> the case sensitivity in grouped and cogrouped pandas UDFs (+37, -8)>
>> <https://github.com/apache/spark/commit/00d06cad564d5e3e5f78a687776d02fe0695a861>
>>
>> The PR proposes to resolve grouping attributes separately first so it can
>> be properly referred to when FlatMapGroupsInPandas and
>> FlatMapCoGroupsInPandas are resolved without ambiguity. Example:
>>
>> from pyspark.sql.functions import *df = spark.createDataFrame([[1, 1]], 
>> ["column", "Score"])pandas_udf("column integer, Score float", 
>> PandasUDFType.GROUPED_MAP)def my_pandas_udf(pdf):
>>     return pdf.assign(Score=0.5)
>> df.groupby('COLUMN').apply(my_pandas_udf).show()
>>
>> df1 = spark.createDataFrame([(1, 1)], ("column", "value"))df2 = 
>> spark.createDataFrame([(1, 1)], ("column", "value"))
>> df1.groupby("COLUMN").cogroup(
>>     df2.groupby("COLUMN")
>> ).applyInPandas(lambda r, l: r + l, df1.schema).show()
>>
>> Before:
>>
>> pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could 
>> be: COLUMN, COLUMN.;
>>
>> pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input 
>> columns: [COLUMN, COLUMN, value, value];;
>> 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], <lambda>(column#9L, 
>> value#10L, column#13L, value#14L), [column#22L, value#23L]
>> :- Project [COLUMN#9L, column#9L, value#10L]
>> :  +- LogicalRDD [column#9L, value#10L], false
>> +- Project [COLUMN#13L, column#13L, value#14L]
>>    +- LogicalRDD [column#13L, value#14L], false
>>
>> After:
>>
>> +------+-----+
>> |column|Score|
>> +------+-----+
>> |     1|  0.5|
>> +------+-----+
>>
>> +------+-----+
>> |column|value|
>> +------+-----+
>> |     2|    2|
>> +------+-----+
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31945sqlpyspark-enable-cache-for-the-same-python-function-25--4>[3.1][SPARK-31945][SQL][PYSPARK]
>> Enable cache for the same Python function (+25, -4)>
>> <https://github.com/apache/spark/commit/032d17933b4009ed8a9d70585434ccdbf4d1d7df>
>>
>> This PR proposes to make PythonFunction hold Seq[Byte] instead of
>> Array[Byte]. After the change, it can compare if the byte array has the
>> same values. With the proposed change, the cache manager will detect the
>> same function and use the cache for it if it exists.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31964python-use-pandas-is_categorical-on-arrow-category-type-conversion-2--5>[3.1][SPARK-31964][PYTHON]
>> Use Pandas is_categorical on Arrow category type conversion (+2, -5)>
>> <https://github.com/apache/spark/commit/b7ef5294f17d54e7d90e36a4be02e8bd67200144>
>>
>> When using PyArrow to convert a Pandas categorical column, use
>> is_categorical instead of trying to import CategoricalDtype, because the
>> former is a more stable API.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#ui>
>> UI
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31903sqlpysparkr-fix-topandas-with-arrow-enabled-to-show-metrics-in-query-ui-4--4>[3.0][SPARK-31903][SQL][PYSPARK][R]
>> Fix toPandas with Arrow enabled to show metrics in Query UI (+4, -4)>
>> <https://github.com/apache/spark/commit/632b5bce23c94d25712b43be83252b34ebfd3e72>
>>
>> In Dataset.collectAsArrowToR and Dataset.collectAsArrowToPython, since
>> the code block for serveToStream is run in the separate thread,
>> withAction finishes as soon as it starts the thread. As a result, it
>> doesn't collect the metrics of the actual action and Query UI shows the
>> plan graph without metrics. This PR fixes the issue.
>>
>> The affected functions are:
>>
>>    - collect() in SparkR
>>    - DataFrame.toPandas() in PySpark
>>
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31886webui-fix-the-wrong-coloring-of-nodes-in-dag-viz-33--3>[3.0][SPARK-31886][WEBUI]
>> Fix the wrong coloring of nodes in DAG-viz (+33, -3)>
>> <https://github.com/apache/spark/commit/8ed93c9355bc2af6fe456d88aa693c8db69d0bbf>
>>
>> In the Job Page and Stage Page, nodes which are associated with "barrier
>> mode" in the DAG-viz will be colored pale green. But, with some types of
>> jobs, nodes which are not associated with the mode will also be colored.
>> This PR fixes it.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#80spark-29431webui-improve-web-ui--sql-tab-visualization-with-cached-dataframes-46--0>[3.1][SPARK-29431][WEBUI]
>> Improve Web UI / Sql tab visualization with cached dataframes (+46, -0)>
>> <https://github.com/apache/spark/commit/e4db3b5b1742b4bdfa32937273e5d07a76cde79b>
>>
>> Display the query plan of cached DataFrames as well in the web UI.
>> [2.4][SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading
>> time regression (+49, -86)>
>> <https://github.com/apache/spark/commit/f535004e14b197ceb1f2108a67b033c052d65bcb>
>>
>> Fix the serious perf issue in web UI by falling back vis-timeline-graph2d
>> to 4.21.0.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-30119webui-support-pagination-for-streaming-tab-259--178>[3.1][SPARK-30119][WEBUI]
>> Support pagination for streaming tab (+259, -178)>
>> <https://github.com/apache/spark/commit/9b098f1eb91a5e9f488d573bfeea3f6bfd9b95b3>
>>
>> The PR adds pagination support for the streaming tab.
>>
>> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31642followup-fix-sorting-for-duration-column-and-make-status-column-sortable-7--6>[3.1][SPARK-31642][FOLLOWUP]
>> Fix Sorting for duration column and make Status column sortable (+7, -6)>
>> <https://github.com/apache/spark/commit/f5f6eee3045e90e02fc7e999f616b5a021d7c724>
>>
>> The PR improves the pagination support in the streaming job, by fixing
>> the wrong sorting result and making Status column sortable.
>>
>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: [OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

Reply via email to