coderfender commented on code in PR #3269:
URL: https://github.com/apache/datafusion-comet/pull/3269#discussion_r2743893503
##########
spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala:
##########
@@ -391,4 +395,38 @@ class CometStringExpressionSuite extends CometTestBase {
}
}
+ test("elt") {
+ withSQLConf(SQLConf.ANSI_ENABLED.key -> "false") {
+ val r = new Random(42)
+ val fieldsCount = 10
+ val indexes = Seq.range(1, fieldsCount)
+ val edgeCasesIndexes = Seq(-1, 0, -100, fieldsCount + 100)
+ val schema = indexes
+ .foldLeft(new StructType())((schema, idx) =>
+ schema.add(s"c$idx", StringType, nullable = true))
+ val df = FuzzDataGenerator.generateDataFrame(
+ r,
+ spark,
+ schema,
+ 100,
+ DataGenOptions(maxStringLength = 6))
+ df.withColumn(
+ "idx",
+ lit(Random.shuffle(indexes ++
edgeCasesIndexes).headOption.getOrElse(-1)))
+ .createOrReplaceTempView("t1")
+ checkSparkAnswerAndOperator(
+ sql(s"SELECT elt(idx, ${schema.fieldNames.mkString(",")}) FROM t1"))
+ checkSparkAnswerAndOperator(
+ sql(s"SELECT elt(cast(null as int),
${schema.fieldNames.mkString(",")}) FROM t1"))
+ checkSparkAnswerMaybeThrows(sql("SELECT elt(1) FROM t1")) match {
+ case (Some(spark), Some(comet)) =>
+
assert(spark.getMessage.contains(WRONG_NUM_ARGS_WITHOUT_SUGGESTION_EXCEPTION_MSG))
+
assert(comet.getMessage.contains(WRONG_NUM_ARGS_WITHOUT_SUGGESTION_EXCEPTION_MSG))
+ case (spark, comet) =>
+ fail(
+ s"Expected Spark and Comet to throw exception, but got\nSpark:
$spark\nComet: $comet")
Review Comment:
it would be great if you can add more tests to cover other code paths
1. Non string data types from pos 2
2. ANSI mode should fall back successfully to Spark
3. Index out of range and make sure we return NULL
4. Verify that first arg being Int is being enforced properly (perhaps pass
a string / long type input as a negative test)
##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -289,6 +289,34 @@ object CometRegExpReplace extends
CometExpressionSerde[RegExpReplace] {
}
}
+object CometElt extends CometScalarFunction[Elt]("elt") {
+
+ override def getSupportLevel(expr: Elt): SupportLevel = {
+ if (expr.failOnError) {
+ return Unsupported(Some("ANSI mode not supported"))
+ }
+ if (expr.children.length < 2) {
+ return Unsupported(Some("The `elt` requires > 1 parameters but the
actual number is 1"))
+ }
+ val idxDataType = expr.children.head.dataType
+ if (idxDataType != IntegerType) {
+ return Unsupported(Some(s"Parameter 1 requires the int type, but got:
$idxDataType"))
+ }
+ if (!expr.children.tail.forall(_.dataType == StringType)) {
+ val unsupportedTypes =
+ expr.children.tail
+ .filter(_.dataType != StringType)
+ .map(_.dataType)
+ .distinct
+ .mkString(", ")
+ return Unsupported(
+ Some(
+ s"Parameters 2 and onwards require the string type, but contains:
$unsupportedTypes"))
Review Comment:
minor : might be a good idea to not do distinct here ..
Some of the options I could think of are
1. Exhaustive error message : `elt(1 , "apple", "banana", 10 , true ,
"StringType")` will produce ` Parameters 2 and onwards require the string type,
but contains: StringType, IntegerType , BooleanType ` and could keep the user
guessing . However, if we remove distinct and preserve order , it might be
`Parameters 2 and onwards require the string type, but datatypes supplied are
StringType, IntegerType , BooleanType , StringType `
2. Break at first non string type and return unsupported to reduce
redundancy
But we could do that in a follow up PR by raising an issue on github too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]