This is an automated email from the ASF dual-hosted git repository.
srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new a37c265371d [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml
"Shortcut common type inference cases to fail fast"
a37c265371d is described below
commit a37c265371dc861fa478dd63deaa38a86415fe3b
Author: Sean Owen <[email protected]>
AuthorDate: Thu Sep 7 15:21:36 2023 -0700
[SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common
type inference cases to fail fast"
### What changes were proposed in this pull request?
Partial back-port of
https://github.com/databricks/spark-xml/commit/994e357f7666956b5d0e63627716b2c092d9abbd?diff=split
from spark-xml
### Why are the changes needed?
Though no more development was intended on spark-xml, there was a
non-trivial improvement to inference speed that I committed anyway to resolve a
customer issue. Part of it can be 'backported' here to sync the code. I
attached this as a follow-up to the main code port JIRA.
There is still, in general, no intent to commit more to spark-xml in the
meantime unless it's significantly important.
### Does this PR introduce _any_ user-facing change?
No, this should only speed up schema inference without behavior change.
### How was this patch tested?
Tested in spark-xml, and will be tested by tests here too
Closes #42844 from srowen/SPARK-44732.2.
Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
---
.../org/apache/spark/sql/catalyst/xml/TypeCast.scala | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
index a00f372da7f..b065dd41f28 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
@@ -155,6 +155,12 @@ private[sql] object TypeCast {
} else {
value
}
+ // A little shortcut to avoid trying many formatters in the common case
that
+ // the input isn't a double. All built-in formats will start with a digit
or period.
+ if (signSafeValue.isEmpty ||
+ !(Character.isDigit(signSafeValue.head) || signSafeValue.head == '.')) {
+ return false
+ }
// Rule out strings ending in D or F, as they will parse as double but
should be disallowed
if (value.nonEmpty && (value.last match {
case 'd' | 'D' | 'f' | 'F' => true
@@ -171,6 +177,11 @@ private[sql] object TypeCast {
} else {
value
}
+ // A little shortcut to avoid trying many formatters in the common case
that
+ // the input isn't a number. All built-in formats will start with a digit.
+ if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+ return false
+ }
(allCatch opt signSafeValue.toInt).isDefined
}
@@ -180,6 +191,11 @@ private[sql] object TypeCast {
} else {
value
}
+ // A little shortcut to avoid trying many formatters in the common case
that
+ // the input isn't a number. All built-in formats will start with a digit.
+ if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+ return false
+ }
(allCatch opt signSafeValue.toLong).isDefined
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]