Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

ruifengz Sun, 16 Aug 2020 19:07:59 -0700

Thanks for letting us know this issue.


On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:

I've checked the Jenkins log and It seems the commit fromhttps://github.com/apache/spark/pull/29404 caused the failure.

On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <[email protected]<mailto:[email protected]>> wrote:


    i noticed commit today that seems to prepare for 3.0.1-rc1:
    commit 05144a5c10cd37ebdbb55fde37d677def49af11f
    Author: Ruifeng Zheng <[email protected]
    <mailto:[email protected]>>
    Date:   Sat Aug 15 01:37:47 2020 +0000

        Preparing Spark release v3.0.1-rc1

    so i tried to build spark on that commit and i get failure in sql:

    09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task
    0 in stage 77.0 failed 1 times; aborting job
    [info] - SPARK-28224: Aggregate sum big decimal overflow ***
    FAILED *** (306 milliseconds)
    [info]   org.apache.spark.SparkException: Job aborted due to stage
    failure: Task 0 in stage 77.0 failed 1 times, most recent failure:
    Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor
    driver): java.lang.ArithmeticException:
    Decimal(expanded,111111111111111111110.246000000000000000,39,18})
    cannot be represented as Decimal(38, 18).
    [info] at
    org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
    [info] at
    
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown
    Source)
    [info] at
    
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
    Source)
    [info] at
    
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
    Source)
    [info] at
    
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
    Source)
    [info] at
    
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    [info] at
    
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
    [info] at
    scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    [info] at
    scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    [info] at
    org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
    [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
    [info] at
    org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
    [info] at
    org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
    [info] at
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    [info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
    [info] at
    
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
    [info] at
    org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    [info] at
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
    [info] at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    [info] at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    [info] at java.lang.Thread.run(Thread.java:748)

    [error] Failed tests:
    [error] org.apache.spark.sql.DataFrameSuite

    On Thu, Aug 13, 2020 at 8:19 PM Jason Moore
    <[email protected]> wrote:

        Thank you so much!  Any update on getting the RC1 up for vote?

        Jason.


        ------------------------------------------------------------------------
        *From:* 郑瑞峰 <[email protected]
        <mailto:[email protected]>>
        *Sent:* Wednesday, 5 August 2020 12:54 PM
        *To:* Jason Moore <[email protected]>; Spark
        dev list <[email protected] <mailto:[email protected]>>
        *Subject:* 回复： [DISCUSS] Apache Spark 3.0.1 Release
        Hi all,
        I am going to prepare the realease of 3.0.1 RC1, with the help
        of Wenchen.


        ------------------ 原始邮件 ------------------
        *发件人:* "Jason Moore" <[email protected]>;
        *发送时间:* 2020年7月30日(星期四) 上午10:35
        *收件人:* "dev"<[email protected]
        <mailto:[email protected]>>;
        *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release

        Hi all,

        Discussion around 3.0.1 seems to have trickled away.  What was
        blocking the release process kicking off?  I can see some
        unresolved bugs raised against 3.0.0, but conversely there
        were quite a few critical correctness fixes waiting to be
        released.

        Cheers,

        Jason.

        *From: *Takeshi Yamamuro <[email protected]
        <mailto:[email protected]>>
        *Date: *Wednesday, 15 July 2020 at 9:00 am
        *To: *Shivaram Venkataraman <[email protected]
        <mailto:[email protected]>>
        *Cc: *"[email protected] <mailto:[email protected]>"
        <[email protected] <mailto:[email protected]>>
        *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release

        > Just wanted to check if there are any blockers that we are
        still waiting for to start the new release process.

        I don't see any on-going blocker in my area.

        Thanks for the notification.

        Bests,

        Tkaeshi

        On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun
        <[email protected] <mailto:[email protected]>> wrote:

            Hi, Yi.

            Could you explain why you think that is a blocker? For the
            given example from the JIRA description,

                spark.udf.register("key", udf((m: Map[String, String])
                => m.keys.head.toInt))

                Seq(Map("1"-> "one", "2"->
                "two")).toDF("a").createOrReplaceTempView("t")

                checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY
                key(a)"), Row(1) :: Nil)

            Apache Spark 3.0.0 seems to work like the following.

                scala> spark.version

                res0: String = 3.0.0

                scala> spark.udf.register("key", udf((m: Map[String,
                String]) => m.keys.head.toInt))

                res1:
                org.apache.spark.sql.expressions.UserDefinedFunction =
                
SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
                map<string,string>])),None,false,true)

                scala> Seq(Map("1" -> "one", "2" ->
                "two")).toDF("a").createOrReplaceTempView("t")

                scala> sql("SELECT key(a) AS k FROM t GROUP BY
                key(a)").collect

                res3: Array[org.apache.spark.sql.Row] = Array([1])

            Could you provide a reproducible example?

            Bests,

            Dongjoon.

            On Tue, Jul 14, 2020 at 10:04 AM Yi Wu
            <[email protected] <mailto:[email protected]>> wrote:

                This probably be a blocker:
                https://issues.apache.org/jira/browse/SPARK-32307

                On Tue, Jul 14, 2020 at 11:13 PM Sean Owen
                <[email protected] <mailto:[email protected]>> wrote:

                    https://issues.apache.org/jira/browse/SPARK-32234 ?

                    On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
                    <[email protected]
                    <mailto:[email protected]>> wrote:
                    >
                    > Hi all
                    >
                    > Just wanted to check if there are any blockers
                    that we are still waiting for to start the new
                    release process.
                    >
                    > Thanks
                    > Shivaram
                    >

--

        ---
        Takeshi Yamamuro



--
---
Takeshi Yamamuro

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Reply via email to