Oron Navon created SPARK-25807:
----------------------------------

             Summary: Mitigate 1-based substr() confusion
                 Key: SPARK-25807
                 URL: https://issues.apache.org/jira/browse/SPARK-25807
             Project: Spark
          Issue Type: Improvement
          Components: Java API, PySpark
    Affects Versions: 2.3.2, 1.3.0, 2.4.0, 2.5.0, 3.0.0
            Reporter: Oron Navon


The method {{Column.substr()}} is 1-based, conforming with SQL and Hive's 
{{SUBSTRING}}, and contradicting both Python's {{substr}} and Java's 
{{substr}}, which are zero-based.  Both PySpark users and Java API users often 
naturally expect a 0-based {{substr()}}. Adding to the confusion, {{substr()}} 
currently allows a {{startPos}} value of 0, which returns the same result as 
{{startPos==1}}.

Since changing {{substr()}} to 0-based is probably NOT a reasonable option 
here, I suggest making one or more of the following changes:
 # Adding a method {{substr0}}, which would be zero-based
 # Renaming {{substr}} to {{substr1}}
 # Making the existing {{substr()}} throw an exception on {{startPos==0}}, 
which should catch and alert most users who expect zero-based behavior.

This is my first discussion on this project, apologies for any faux pas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to