andygrove commented on code in PR #19570:
URL: https://github.com/apache/datafusion/pull/19570#discussion_r2661086930


##########
datafusion/functions/src/string/split_part.rs:
##########
@@ -219,22 +219,22 @@ where
         .try_for_each(|((string, delimiter), n)| -> Result<(), 
DataFusionError> {
             match (string, delimiter, n) {
                 (Some(string), Some(delimiter), Some(n)) => {
-                    let split_string: Vec<&str> = 
string.split(delimiter).collect();
-                    let len = split_string.len();
-
-                    let index = match n.cmp(&0) {
-                        std::cmp::Ordering::Less => len as i64 + n,
+                    let result = match n.cmp(&0) {
+                        std::cmp::Ordering::Greater => {
+                            // Positive index: use nth() to avoid collecting 
all parts
+                            // This stops iteration as soon as we find the nth 
element
+                            string.split(delimiter).nth((n - 1) as usize)
+                        }
+                        std::cmp::Ordering::Less => {
+                            // Negative index: use rsplit().nth() to 
efficiently get from the end
+                            // rsplit iterates in reverse, so -1 means first 
from rsplit (index 0)
+                            string.rsplit(delimiter).nth((-n - 1) as usize)

Review Comment:
   Good catch, thanks. I changed to use try_into with appriate error handling



##########
datafusion/functions/src/string/split_part.rs:
##########
@@ -219,22 +219,22 @@ where
         .try_for_each(|((string, delimiter), n)| -> Result<(), 
DataFusionError> {
             match (string, delimiter, n) {
                 (Some(string), Some(delimiter), Some(n)) => {
-                    let split_string: Vec<&str> = 
string.split(delimiter).collect();
-                    let len = split_string.len();
-
-                    let index = match n.cmp(&0) {
-                        std::cmp::Ordering::Less => len as i64 + n,
+                    let result = match n.cmp(&0) {
+                        std::cmp::Ordering::Greater => {
+                            // Positive index: use nth() to avoid collecting 
all parts
+                            // This stops iteration as soon as we find the nth 
element
+                            string.split(delimiter).nth((n - 1) as usize)

Review Comment:
   Good catch, thanks. I changed to use try_into with appropriate error handling



##########
datafusion/functions/src/string/split_part.rs:
##########
@@ -219,22 +219,22 @@ where
         .try_for_each(|((string, delimiter), n)| -> Result<(), 
DataFusionError> {
             match (string, delimiter, n) {
                 (Some(string), Some(delimiter), Some(n)) => {
-                    let split_string: Vec<&str> = 
string.split(delimiter).collect();
-                    let len = split_string.len();
-
-                    let index = match n.cmp(&0) {
-                        std::cmp::Ordering::Less => len as i64 + n,
+                    let result = match n.cmp(&0) {
+                        std::cmp::Ordering::Greater => {
+                            // Positive index: use nth() to avoid collecting 
all parts
+                            // This stops iteration as soon as we find the nth 
element
+                            string.split(delimiter).nth((n - 1) as usize)
+                        }
+                        std::cmp::Ordering::Less => {
+                            // Negative index: use rsplit().nth() to 
efficiently get from the end
+                            // rsplit iterates in reverse, so -1 means first 
from rsplit (index 0)
+                            string.rsplit(delimiter).nth((-n - 1) as usize)

Review Comment:
   Good catch, thanks. I changed to use try_into with appropriate error handling



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to