Re: [OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

Holden Karau Tue, 21 Jul 2020 11:15:26 -0700

I'd also add [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are
being shutdown &


[SPARK-21040][CORE] Speculate tasks which are running on decommission
executors two of the PRs merged after the decommissioning SPIP.

On Tue, Jul 21, 2020 at 10:53 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote:

> Hi all,
>
> This is the bi-weekly Apache Spark digest from the Databricks OSS team.
> For each API/configuration/behavior change, an *[API] *tag is added in
> the title.
>
> CORE
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31923core-ignore-internal-accumulators-that-use-unrecognized-types-rather-than-crashing-63--5>[3.0][SPARK-31923][CORE]
> Ignore internal accumulators that use unrecognized types rather than
> crashing (+63, -5)>
> <https://github.com/apache/spark/commit/b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4>
>
> A user may name his accumulators using the internal.metrics. prefix, so
> that Spark treats them as internal accumulators and hides them from UI. We
> should make JsonProtocol.accumValueToJson more robust and let it ignore
> internal accumulators that use unrecognized types.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api80spark-31486core-sparksubmitwaitappcompletion-flag-to-control-spark-submit-exit-in-standalone-cluster-mode-88--26>[API][3.1][SPARK-31486][CORE]
> spark.submit.waitAppCompletion flag to control spark-submit exit in
> Standalone Cluster Mode (+88, -26)>
> <https://github.com/apache/spark/commit/6befb2d8bdc5743d0333f4839cf301af165582ce>
>
> This PR implements an application wait mechanism that allows spark-submit to
> wait until the application finishes in Standalone mode. This will delay the
> exit of spark-submit JVM until the job is completed. This implementation
> will keep monitoring the application until it is either finished, failed,
> or killed. This will be controlled via the following conf:
>
>    -
>
>    spark.standalone.submit.waitAppCompletion (Default: false)
>
>    In standalone cluster mode, controls whether the client waits to exit
>    until the application completes. If set to true, the client process
>    will stay alive polling the driver's status. Otherwise, the client process
>    will exit after submission.
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#sql>
> SQL
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#71spark-31220sql-repartition-obeys-initialpartitionnum-when-adaptiveexecutionenabled-27--12>[3.0][SPARK-31220][SQL]
> repartition obeys initialPartitionNum when adaptiveExecutionEnabled (+27,
> -12)>
> <https://github.com/apache/spark/commit/1d1eacde9d1b6fb75a20e4b909d221e70ad737db>
>
> AQE and non-AQE use different configs to set the initial shuffle partition
> number. This PR fixes repartition/DISTRIBUTE BY so that it also uses the
> AQE config spark.sql.adaptive.coalescePartitions.initialPartitionNum to
> set the initial shuffle partition number if AQE is enabled.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31867sqlfollowup-check-result-differences-for-datetime-formatting-51--8>[3.0][SPARK-31867][SQL][FOLLOWUP]
> Check result differences for datetime formatting (+51, -8)>
> <https://github.com/apache/spark/commit/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1>
>
> Spark should throw SparkUpgradeException when getting DateTimeException for
> datetime formatting in the EXCEPTION legacy Time Parser Policy.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api70spark-31879spark-31892sql-disable-week-based-pattern-letters-in-datetime-parsingformatting-1421--171-102--48>[API][3.0][SPARK-31879][SPARK-31892][SQL]
> Disable week-based pattern letters in datetime parsing/formatting (+1421,
> -171)>
> <https://github.com/apache/spark/commit/9d5b5d0a5849ac329bbae26d9884d8843d8a8571>
>  (+102,
> -48)>
> <https://github.com/apache/spark/commit/afe95bd9ad7a07c49deecf05f0a1000bb8f80caa>
>
> Week-based pattern letters have very weird behaviors during datetime
> parsing in Spark 2.4, and it's very hard to simulate the legacy behaviors
> with the new API. For formatting, the new API makes the start-of-week
> localized, and it's not possible to keep the legacy behaviors. Since the
> week-based fields are rarely used, we disable week-based pattern letters in
> both parsing and formatting.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31896sql-handle-am-pm-timestamp-parsing-when-hour-is-missing-39--3>[3.0][SPARK-31896][SQL]
> Handle am-pm timestamp parsing when hour is missing (+39, -3)>
> <https://github.com/apache/spark/commit/afcc14c6d27f9e0bd113e0d86b64dc6fa4eed551>
>
> This PR sets the hour field to 0 or 12 when the AMPM_OF_DAY field is AM or
> PM during datetime parsing, to keep the behavior the same as Spark 2.4.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api80spark-31830sql-consistent-error-handling-for-datetime-formatting-and-parsing-functions-126--580>[API][3.1][SPARK-31830][SQL]
> Consistent error handling for datetime formatting and parsing functions
> (+126, -580)>
> <https://github.com/apache/spark/commit/6a424b93e5bdb79b1f1310cf48bd034397779e14>
>
> When parsing/formatting datetime values, it's better to fail fast if the
> pattern string is invalid, instead of returning null for each input record.
> The formatting functions such as date_format already do it, this PR
> applies the fail-fast behavior to parsing functions: from_unixtime,
> unix_timestamp,to_unix_timestamp, to_timestamp and to_date.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#80spark-31910sql-enable-java-8-time-api-in-thrift-server-23--0>[3.1][SPARK-31910][SQL]
> Enable Java 8 time API in Thrift server (+23, -0)>
> <https://github.com/apache/spark/commit/2c9988eaf31b7ebd97f2c2904ed7ee531eff0d20>
>
> This PR enables Java 8 time API in thriftserver, so that we use the
> session timezone more consistently.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#55spark-31935sql-hadoop-file-system-config-should-be-effective-in-data-source-options-52--7>[2.4][SPARK-31935][SQL]
> Hadoop file system config should be effective in data source options (+52,
> -7)>
> <https://github.com/apache/spark/commit/f3771c6b47d0b3aef10b86586289a1f675c7cfe2>
>
> This PR fixes a bug that the hadoop configs in read/write options are not
> respected in data source V1.
> [API][2.4][SPARK-31968][SQL] Duplicate partition columns check when
> writing data (+12, -1)>
> <https://github.com/apache/spark/commit/a4ea599b1b9b8ebaae0100b54e6ac1d7576c6d8c>
>
> Add a check for duplicate partition columns when writing built-in file
> sources. After the change, when the DataFrame has duplicate partition
> columns, the users get an AnalysisException when writing it. Previously,
> the writing would succeed, but reading the files with duplicate columns
> will fail.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api71spark-26905sql-add-type-in-the-ansi-non-reserved-list-2--0>[API][3.0][SPARK-26905][SQL]
> Add TYPE in the ANSI non-reserved list (+2, -0)>
> <https://github.com/apache/spark/commit/e14029b18df10db5094f8abf8b9874dbc9186b4e>
>
> Add TYPE in the ANSI non-reserved list to follow the ANSI/SQL standard.
> The change impacts the behavior only when ANSI mode is on
> (spark.sql.ansi.enabled=true)
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api71spark-26905sql-follow-the-sql2016-reserved-keywords-429--5>[API][3.0][SPARK-26905][SQL]
> Follow the SQL:2016 reserved keywords (+429, -5)>
> <https://github.com/apache/spark/commit/3698a14204dd861ea3ee3c14aa923123b52caba1>
>
> Move keywords ANTI, SEMI, and MINUS from reserved to non-reserved to
> comply with the ANSI/SQL standard. The change impacts the behavior only
> when ANSI mode is on (spark.sql.ansi.enabled=true)
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api70spark-31939sqltest-java11-fix-parsing-day-of-year-when-year-field-pattern-is-missing-465--3>[API][3.0][SPARK-31939][SQL][TEST-JAVA11]
> Fix Parsing day of year when year field pattern is missing (+465, -3)>
> <https://github.com/apache/spark/commit/22dda6e18e91c6db6fa8ff9fafaafe09a79db4ea>
>
> When a datetime pattern does not contain a year field (ie. 'yyyy') but
> contains the day of year field (ie. 'DD'), Spark should still be able to
> respect the datetime pattern and parse the constants.
>
> Before the change:
>
> spark-sql> select to_timestamp('31', 'DD');
> 1970-01-01 00:00:00
> spark-sql> select to_timestamp('31 30', 'DD dd');
> 1970-01-30 00:00:00
>
> After the change:
>
> spark-sql> select to_timestamp('31', 'DD');
> 1970-01-31 00:00:00
> spark-sql> select to_timestamp('31 30', 'DD dd');
> NULL
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31956sql-do-not-fail-if-there-is-no-ambiguous-self-join-7--2>[3.0][SPARK-31956][SQL]
> Do not fail if there is no ambiguous self join (+7, -2)>
> <https://github.com/apache/spark/commit/c40051932290db3a63f80324900a116019b1e589>
>
> df("col").as("name") is not a column reference anymore, and should not
> have the special column metadata that is used to identify the root
> attribute (e.g., Dataset ID and col position). This PR fixes the
> corresponding regression that could cause a DataFrame could fail even when
> there is no ambiguous self-join. Below is an example,
>
> val joined = df.join(spark.range(1)).select($"a")
> joined.select(joined("a").alias("x"), sum(joined("a")).over(w))
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31958sql-normalize-special-floating-numbers-in-subquery-18--4>[3.0][SPARK-31958][SQL]
> normalize special floating numbers in subquery (+18, -4)>
> <https://github.com/apache/spark/commit/6fb9c80da129d0b43f9ff5b8be6ce8bad992a4ed>
>
> The PR fixes a bug that special floating numbers in non-correlated
> subquery expressions are not handled, now the subquery expressions will be
> handled by OptimizeSubqueries.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#api80spark-21117sql-built-in-sql-function-support---width_bucket-431--30>[API][3.1][SPARK-21117][SQL]
> Built-in SQL Function Support - WIDTH_BUCKET (+431, -30)>
> <https://github.com/apache/spark/commit/b1adc3deee00058cba669534aee156dc7af243dc>
>
> Add a built-in SQL function WIDTH_BUCKET, that returns the bucket number
> to which value would be assigned in an equiwidth histogram with num_bucket 
> buckets,
> in the range min_value to max_value. Examples:
>
> > SELECT WIDTH_BUCKET(5.3, 0.2, 10.6, 5);
> 3
> > SELECT WIDTH_BUCKET(-2.1, 1.3, 3.4, 3);
> 0
> > SELECT WIDTH_BUCKET(8.1, 0.0, 5.7, 4);
> 5
> > SELECT WIDTH_BUCKET(-0.9, 5.2, 0.5, 2);
> 3
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-27217sql-nested-column-aliasing-for-more-operators-which-can-prune-nested-column-190--10>[3.1][SPARK-27217][SQL]
> Nested column aliasing for more operators which can prune nested column
> (+190, -10)>
> <https://github.com/apache/spark/commit/43063e2db2bf7469f985f1954d8615b95cf5c578>
>
> Support nested column pruning from an Aggregate or Expand operator.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-27633sql-remove-redundant-aliases-in-nestedcolumnaliasing-43--1>[3.1][SPARK-27633][SQL]
> Remove redundant aliases in NestedColumnAliasing (+43, -1)>
> <https://github.com/apache/spark/commit/8282bbf12d4e174986a649023ce3984aae7d7755>
>
> Avoid generating redundant aliases if the parent nested field is aliased
> in the NestedColumnAliasing rule. This slightly improves the performance.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31736sql-nested-column-aliasing-for-repartitionbyexpressionjoin-197--16>[3.1][SPARK-31736][SQL]
> Nested column aliasing for RepartitionByExpression/Join (+197, -16)>
> <https://github.com/apache/spark/commit/ff89b1114319e783eb4f4187bf2583e5e21c64e4>
>
> Support nested column pruning from a RepartitionByExpression or Join
> operator.
> ML
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31925ml-summarytotaliterations-greater-than-maxiters-43--12>[3.1][SPARK-31925][ML]
> Summary.totalIterations greater than maxIters (+43, -12)>
> <https://github.com/apache/spark/commit/f83cb3cbb3ce3f22fd122bce620917bfd0699ce7>
>
> The PR fixes a correctness issue in LogisticRegression and
> LinearRegression, that the actual round of training iterations was larger
> by 1 than the specified maxIter.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31944-add-instance-weight-support-in-linearregressionsummary-56--24>[3.1][SPARK-31944]
> Add instance weight support in LinearRegressionSummary (+56, -24)>
> <https://github.com/apache/spark/commit/89c98a4c7068734e322d335cb7c9f22379ff00e8>
>
> The PR adds instance weight support in LinearRegressionSummary, instance
> weight is already supported by LinearRegression and RegressionMetrics.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#ss>
> SS
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#71spark-31593ss-remove-unnecessary-streaming-query-progress-update-58--7>[3.0][SPARK-31593][SS]
> Remove unnecessary streaming query progress update (+58, -7)>
> <https://github.com/apache/spark/commit/1e40bccf447dccad9d31bccc75d21b8fca77ba52>
>
> The PR fixes a bug that sets incorrect metrics in Structured Streaming. We
> should make a progress update every 10 seconds when a stream doesn't have
> any new data upstream. Without the fix, we zero out the input information
> but not the output information when making the progress update.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#70spark-31990ss-use-tosettoseq-in-datasetdropduplicates-3--1>[3.0][SPARK-31990][SS]
> Use toSet.toSeq in Dataset.dropDuplicates (+3, -1)>
> <https://github.com/apache/spark/commit/7f7b4dd5199e7c185aedf51fccc400c7072bed05>
>
> The PR proposes to preserve the input order of colNames for groupCols in
> Dataset.dropDuplicates, because the Streaming's state store depends on
> the groupCols order.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-24634ss-add-a-new-metric-regarding-number-of-inputs-later-than-watermark-plus-allowed-delay-94--29>[3.1][SPARK-24634][SS]
> Add a new metric regarding number of inputs later than watermark plus
> allowed delay (+94, -29)>
> <https://github.com/apache/spark/commit/84815d05503460d58b85be52421d5923474aa08b>
>
> Add a new metrics numLateInputs to count the number of inputs which are
> later than watermark ('inputs' are relative to operators). The new metrics
> will be provided both on the SparkUI - SQL Tab - query execution details
> page, and on the Streaming Query Listener.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#python>
> PYTHON
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#api70spark-31895pythonsql-support-dataframeexplainextended-str-case-to-be-consistent-with-scala-side-24--11>[API][3.0][SPARK-31895][PYTHON][SQL]
> Support DataFrame.explain(extended: str) case to be consistent with Scala
> side (+24, -11)>
> <https://github.com/apache/spark/commit/e1d52011401c1989f26b230eb8c82adc63e147e7>
>
> Improves DataFrame.explain in PySpark, so that it takes the explain mode
> string as well, which is consistent with the Scala API.
> [3.0][SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per
> the case sensitivity in grouped and cogrouped pandas UDFs (+37, -8)>
> <https://github.com/apache/spark/commit/00d06cad564d5e3e5f78a687776d02fe0695a861>
>
> The PR proposes to resolve grouping attributes separately first so it can
> be properly referred to when FlatMapGroupsInPandas and
> FlatMapCoGroupsInPandas are resolved without ambiguity. Example:
>
> from pyspark.sql.functions import *df = spark.createDataFrame([[1, 1]], 
> ["column", "Score"])pandas_udf("column integer, Score float", 
> PandasUDFType.GROUPED_MAP)def my_pandas_udf(pdf):
>     return pdf.assign(Score=0.5)
> df.groupby('COLUMN').apply(my_pandas_udf).show()
>
> df1 = spark.createDataFrame([(1, 1)], ("column", "value"))df2 = 
> spark.createDataFrame([(1, 1)], ("column", "value"))
> df1.groupby("COLUMN").cogroup(
>     df2.groupby("COLUMN")
> ).applyInPandas(lambda r, l: r + l, df1.schema).show()
>
> Before:
>
> pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could 
> be: COLUMN, COLUMN.;
>
> pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input 
> columns: [COLUMN, COLUMN, value, value];;
> 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], <lambda>(column#9L, value#10L, 
> column#13L, value#14L), [column#22L, value#23L]
> :- Project [COLUMN#9L, column#9L, value#10L]
> :  +- LogicalRDD [column#9L, value#10L], false
> +- Project [COLUMN#13L, column#13L, value#14L]
>    +- LogicalRDD [column#13L, value#14L], false
>
> After:
>
> +------+-----+
> |column|Score|
> +------+-----+
> |     1|  0.5|
> +------+-----+
>
> +------+-----+
> |column|value|
> +------+-----+
> |     2|    2|
> +------+-----+
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31945sqlpyspark-enable-cache-for-the-same-python-function-25--4>[3.1][SPARK-31945][SQL][PYSPARK]
> Enable cache for the same Python function (+25, -4)>
> <https://github.com/apache/spark/commit/032d17933b4009ed8a9d70585434ccdbf4d1d7df>
>
> This PR proposes to make PythonFunction hold Seq[Byte] instead of
> Array[Byte]. After the change, it can compare if the byte array has the
> same values. With the proposed change, the cache manager will detect the
> same function and use the cache for it if it exists.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31964python-use-pandas-is_categorical-on-arrow-category-type-conversion-2--5>[3.1][SPARK-31964][PYTHON]
> Use Pandas is_categorical on Arrow category type conversion (+2, -5)>
> <https://github.com/apache/spark/commit/b7ef5294f17d54e7d90e36a4be02e8bd67200144>
>
> When using PyArrow to convert a Pandas categorical column, use
> is_categorical instead of trying to import CategoricalDtype, because the
> former is a more stable API.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#ui>
> UI
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31903sqlpysparkr-fix-topandas-with-arrow-enabled-to-show-metrics-in-query-ui-4--4>[3.0][SPARK-31903][SQL][PYSPARK][R]
> Fix toPandas with Arrow enabled to show metrics in Query UI (+4, -4)>
> <https://github.com/apache/spark/commit/632b5bce23c94d25712b43be83252b34ebfd3e72>
>
> In Dataset.collectAsArrowToR and Dataset.collectAsArrowToPython, since
> the code block for serveToStream is run in the separate thread, withAction 
> finishes
> as soon as it starts the thread. As a result, it doesn't collect the
> metrics of the actual action and Query UI shows the plan graph without
> metrics. This PR fixes the issue.
>
> The affected functions are:
>
>    - collect() in SparkR
>    - DataFrame.toPandas() in PySpark
>
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#70spark-31886webui-fix-the-wrong-coloring-of-nodes-in-dag-viz-33--3>[3.0][SPARK-31886][WEBUI]
> Fix the wrong coloring of nodes in DAG-viz (+33, -3)>
> <https://github.com/apache/spark/commit/8ed93c9355bc2af6fe456d88aa693c8db69d0bbf>
>
> In the Job Page and Stage Page, nodes which are associated with "barrier
> mode" in the DAG-viz will be colored pale green. But, with some types of
> jobs, nodes which are not associated with the mode will also be colored.
> This PR fixes it.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-3-~-June-9,-2020#80spark-29431webui-improve-web-ui--sql-tab-visualization-with-cached-dataframes-46--0>[3.1][SPARK-29431][WEBUI]
> Improve Web UI / Sql tab visualization with cached dataframes (+46, -0)>
> <https://github.com/apache/spark/commit/e4db3b5b1742b4bdfa32937273e5d07a76cde79b>
>
> Display the query plan of cached DataFrames as well in the web UI.
> [2.4][SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading
> time regression (+49, -86)>
> <https://github.com/apache/spark/commit/f535004e14b197ceb1f2108a67b033c052d65bcb>
>
> Fix the serious perf issue in web UI by falling back vis-timeline-graph2d
> to 4.21.0.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-30119webui-support-pagination-for-streaming-tab-259--178>[3.1][SPARK-30119][WEBUI]
> Support pagination for streaming tab (+259, -178)>
> <https://github.com/apache/spark/commit/9b098f1eb91a5e9f488d573bfeea3f6bfd9b95b3>
>
> The PR adds pagination support for the streaming tab.
>
> <https://github.com/databricks/runtime/wiki/OSS-Digest-June-10-~-June-16,-2020#80spark-31642followup-fix-sorting-for-duration-column-and-make-status-column-sortable-7--6>[3.1][SPARK-31642][FOLLOWUP]
> Fix Sorting for duration column and make Status column sortable (+7, -6)>
> <https://github.com/apache/spark/commit/f5f6eee3045e90e02fc7e999f616b5a021d7c724>
>
> The PR improves the pagination support in the streaming job, by fixing the
> wrong sorting result and making Status column sortable.
>
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

Reply via email to