[I] [CH] sum() not support with fallback operator [incubator-gluten]

via GitHub Wed, 16 Apr 2025 23:23:25 -0700


lwz9103 opened a new issue, #9351:
URL: https://github.com/apache/incubator-gluten/issues/9351


   ### Backend
   
   CH (ClickHouse)
   
   ### Bug description
   
   ```
       withTempView("clicks") {
         val df = Seq(
           // small window: [00:00, 01:00), user1, 2
           ("2024-09-30 00:00:00", "user1"), ("2024-09-30 00:00:30", "user1"),
           // small window: [01:00, 02:00), user2, 2
           ("2024-09-30 00:01:00", "user2"), ("2024-09-30 00:01:30", "user2"),
           // small window: [03:00, 04:00), user1, 1
           ("2024-09-30 00:03:30", "user1"),
           // small window: [11:00, 12:00), user1, 3
           ("2024-09-30 00:11:00", "user1"), ("2024-09-30 00:11:30", "user1"),
           ("2024-09-30 00:11:45", "user1")
         ).toDF("eventTime", "userId")
   
         // session window: (01:00, 09:00), user1, 3 / (02:00, 07:00), user2, 2 
/
         //   (12:00, 12:05), user1, 3
   
         df.createOrReplaceTempView("clicks")
   
         val aggregatedData = spark.sql(
           """
             |  SELECT
             |    session_window(small_window, '5 minutes') AS session,
             |    userId,
             |    sum(numClicks) AS numClicks
             |  FROM
             |  (
             |    SELECT
             |      window(eventTime, '1 minute') AS small_window,
             |      userId,
             |      count(*) AS numClicks
             |    FROM clicks
             |    GROUP BY window, userId
             |  ) cpu_small
             |  GROUP BY session_window, userId
             |""".stripMargin)
   
         checkAnswer(
           aggregatedData,
           Seq(Row("user1", 3), Row("user2", 2))
         )
       }
   ```
   
   ### Gluten version
   
   _No response_
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   Caused by: org.apache.gluten.exception.GlutenException: Doesn't support type 
AggregateFunction(sum, Nullable(Int64)) for writeValue
   0. Poco::Exception::Exception(String const&, int) @ 0x00000000163162b2
   1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 
0x000000000d35b679
   2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000006d2ceec
   3. DB::Exception::Exception<String>(int, 
FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 
0x0000000006d33bcb
   4. local_engine::CHColumnToSparkRow::convertCHColumnToSparkRow(DB::Block 
const&, std::unique_ptr<std::vector<unsigned long, std::allocator<unsigned 
long>>, std::default_delete<std::vector<unsigned long, std::allocator<unsigned 
long>>>> const&) @ 0x000000000d780086
   5. 
Java_org_apache_gluten_vectorized_CHBlockConverterJniWrapper_convertColumnarToRow
 @ 0x0000000006d1ca55
   
        at 
org.apache.gluten.vectorized.CHBlockConverterJniWrapper.convertColumnarToRow(Native
 Method)
        at 
org.apache.spark.sql.execution.utils.CHExecUtil$.getRowIterFromSparkRowInfo(CHExecUtil.scala:160)
        at 
org.apache.spark.sql.execution.utils.CHExecUtil$.c2r(CHExecUtil.scala:169)
        at 
org.apache.spark.sql.execution.CHColumnarToRowRDD.$anonfun$f$4(CHColumnarToRowExec.scala:105)
        at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
        at 
org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
        at 
org.apache.spark.sql.execution.CHColumnarToRowRDD.$anonfun$f$2(CHColumnarToRowExec.scala:105)
        at scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:587)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:601)
        at 
org.apache.spark.sql.execution.aggregate.MergingSessionsExec.$anonfun$doExecute$1(MergingSessionsExec.scala:71)
        at 
org.apache.spark.sql.execution.aggregate.MergingSessionsExec.$anonfun$doExecute$1$adapted(MergingSessionsExec.scala:68)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:881)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:881)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [CH] sum() not support with fallback operator [incubator-gluten]

Reply via email to