[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

Hyukjin Kwon (JIRA) Sun, 06 May 2018 19:05:21 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465356#comment-16465356
 ]


Hyukjin Kwon edited comment on SPARK-23291 at 5/7/18 2:04 AM:
--------------------------------------------------------------

[~felixcheung], sure, I agree with that in general. However, we could probably 
think about this way too for this case specifically:

in other words, it has been wrong for 3 years, it requires weird codes for R 
specifically comparing to other languages APIs. IMHO, It's a bit subtlety and 
users might be adopted to this bugs rather than bothering to report this out 
(of course I guess with some nuisance). Think about this expr("substr(...)") 
and substr work differently. I am also seeing [expr("substr(...)") is suggested 
as an alternative of 
substr|https://stackoverflow.com/questions/37413122/use-of-substr-on-dataframe-column-in-sparkr?rq=1]
  If it's clearly documented in the migration guide, I thought it can be fine.

Also, this substr case is pretty well understood and isolated.

As a reference, I recall a case - 
https://github.com/apache/spark/pull/20499#issuecomment-363863660. It sounds 
pretty a similar case with that. I was hesitant at that time too but after 
thinking for a while, I ended up with kind of agreeing that the backport is 
okay. It wasn't a regression at that time too.



was (Author: hyukjin.kwon):
[~felixcheung], sure, I agree with that in general. However, we could probably 
think about this way too for this case specifically:

in other words, it has been wrong for 3 years, it requires weird codes for R 
specifically comparing to other languages APIs. IMHO, It's a bit subtlety and 
users might be adopted to this bugs rather than bothering this out (of course I 
guess with some nuisance). Think about this expr("substr(...)") and substr work 
differently. I am also seeing [expr("substr(...)") is suggested as an 
alternative of 
substr|https://stackoverflow.com/questions/37413122/use-of-substr-on-dataframe-column-in-sparkr?rq=1]
  If it's clearly documented in the migration guide, I thought it can be fine.

Also, this substr case is pretty well understood and isolated.

As a reference, I recall a case - 
https://github.com/apache/spark/pull/20499#issuecomment-363863660. It sounds 
pretty a similar case with that. I was hesitant at that time too but after 
thinking for a while, I ended up with kind of agreeing that the backport is 
okay. It wasn't a regression at that time too.


> SparkR : substr : In SparkR dataframe , starting and ending position 
> arguments in "substr" is giving wrong result  when the position is greater 
> than 1
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23291
>                 URL: https://issues.apache.org/jira/browse/SPARK-23291
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>            Reporter: Narendra
>            Assignee: Liang-Chi Hsieh
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Defect Description :
> -----------------------------
> For example ,an input string "2017-12-01" is read into a SparkR dataframe 
> "df" with column name "col1".
>  The target is to create a a new column named "col2" with the value "12" 
> which is inside the string ."12" can be extracted with "starting position" as 
> "6" and "Ending position" as "7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 
> 'starting position', is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates 
> the 'ending position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be 
> the "actual position + 1"
> Expected behavior :
> ----------------------------
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> -----------
>  This defect is observed with only when the starting position is greater than 
> 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

Reply via email to