Copilot commented on code in PR #2129:
URL: https://github.com/apache/auron/pull/2129#discussion_r3007143666
##########
native-engine/datafusion-ext-functions/src/spark_dates.rs:
##########
@@ -46,6 +46,29 @@ pub fn spark_day(args: &[ColumnarValue]) ->
Result<ColumnarValue> {
Ok(ColumnarValue::Array(date_part(&input, DatePart::Day)?))
}
+/// `spark_dayofweek(date/timestamp/compatible-string)`
+///
+/// Matches Spark's `dayofweek()` semantics:
+/// Sunday = 1, Monday = 2, ..., Saturday = 7.
+pub fn spark_dayofweek(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+ let input = cast(&args[0].clone().into_array(1)?, &DataType::Date32)?;
+ let input = input
+ .as_any()
+ .downcast_ref::<Date32Array>()
+ .expect("internal cast to Date32 must succeed");
+
+ let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).expect("1970-01-01 must be
a valid date");
+ let dayofweek = Int32Array::from_iter(input.iter().map(|opt_days| {
+ opt_days.and_then(|days| {
+ epoch
+ .checked_add_signed(Duration::days(days as i64))
+ .map(|date| date.weekday().num_days_from_sunday() as i32 + 1)
Review Comment:
`spark_dayofweek` computes weekday by materializing a `NaiveDate` per row
(epoch + Duration::days) and uses `checked_add_signed`, which will yield NULL
for Date32 values outside chrono's supported range. This is both slower than
necessary and can diverge from Spark/Date32 semantics for extreme (but
representable) dates. Consider computing day-of-week directly from the Date32
day offset with modular arithmetic (e.g., based on the known weekday of
1970-01-01) to avoid chrono range limits and per-row date construction.
```suggestion
// Date32 is days since 1970-01-01. 1970-01-01 is a Thursday.
// If we number weekdays so that Sunday = 0, ..., Saturday = 6,
// then 1970-01-01 corresponds to 4. For an offset `days`,
// weekday_index = (days + 4) mod 7 gives 0 = Sunday, ..., 6 = Saturday.
// Spark wants Sunday = 1, ..., Saturday = 7, so we add 1.
let dayofweek = Int32Array::from_iter(input.iter().map(|opt_days| {
opt_days.map(|days| {
let weekday_index = (days as i64 + 4).rem_euclid(7);
weekday_index as i32 + 1
```
##########
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala:
##########
@@ -117,6 +117,17 @@ class AuronFunctionSuite extends AuronQueryTest with
BaseAuronSQLSuite {
}
}
+ test("dayofweek function") {
+ withTable("t1") {
+ sql("create table t1(c1 date) using parquet")
+ sql("insert into t1 values(date'2009-07-30')")
+
+ checkSparkAnswerAndOperator("select dayofweek(c1) from t1")
+ checkAnswer(sql("select dayofweek(c1) from t1"), Seq(Row(5)))
+ checkAnswer(sql("select dayofweek('2009-07-30')"), Seq(Row(5)))
+ }
Review Comment:
The new test only covers DATE column input and a string literal. Since the
PR claims `dayofweek()` supports TIMESTAMP and is null-safe, please add
assertions for (1) NULL input returning NULL and (2) a TIMESTAMP input (column
or literal) matching Spark’s result (ideally with an explicit session timeZone
to make the expected behavior stable).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]