[ 
https://issues.apache.org/jira/browse/SPARK-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yadong Qi updated SPARK-12352:
------------------------------
    Description: 
When use split in sql, if we want to get the different value through index from 
same array, it will split the same row every time. And the split in Java is 
poor performance.

{code}
spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as c 
from (select split(value, ',') as array from src_split) t;
== Parsed Logical Plan ==
'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS 
b#17),unresolvedalias('array[2] AS c#18)]
 'Subquery t
  'Project [unresolvedalias('split('value,,) AS array#15)]
   'UnresolvedRelation [src_split], None

== Analyzed Logical Plan ==
a: string, b: string, c: string
Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18]
 Subquery t
  Project [split(value#20,,) AS array#15]
   MetastoreRelation default, src_split, None

== Optimized Logical Plan ==
Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
b#17,split(value#20,,)[2] AS c#18]
 MetastoreRelation default, src_split, None

== Physical Plan ==
Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
b#17,split(value#20,,)[2] AS c#18]
 HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
{code}

  was:
When use split in sql, if we want to get the value through index from same 
array, it will split the same row every time.

{code}
spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as c 
from (select split(value, ',') as array from src_split) t;
== Parsed Logical Plan ==
'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS 
b#17),unresolvedalias('array[2] AS c#18)]
 'Subquery t
  'Project [unresolvedalias('split('value,,) AS array#15)]
   'UnresolvedRelation [src_split], None

== Analyzed Logical Plan ==
a: string, b: string, c: string
Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18]
 Subquery t
  Project [split(value#20,,) AS array#15]
   MetastoreRelation default, src_split, None

== Optimized Logical Plan ==
Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
b#17,split(value#20,,)[2] AS c#18]
 MetastoreRelation default, src_split, None

== Physical Plan ==
Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
b#17,split(value#20,,)[2] AS c#18]
 HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
{code}


> Reuse the result of split in SQL
> --------------------------------
>
>                 Key: SPARK-12352
>                 URL: https://issues.apache.org/jira/browse/SPARK-12352
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Yadong Qi
>
> When use split in sql, if we want to get the different value through index 
> from same array, it will split the same row every time. And the split in Java 
> is poor performance.
> {code}
> spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as 
> c from (select split(value, ',') as array from src_split) t;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS 
> b#17),unresolvedalias('array[2] AS c#18)]
>  'Subquery t
>   'Project [unresolvedalias('split('value,,) AS array#15)]
>    'UnresolvedRelation [src_split], None
> == Analyzed Logical Plan ==
> a: string, b: string, c: string
> Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18]
>  Subquery t
>   Project [split(value#20,,) AS array#15]
>    MetastoreRelation default, src_split, None
> == Optimized Logical Plan ==
> Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
> b#17,split(value#20,,)[2] AS c#18]
>  MetastoreRelation default, src_split, None
> == Physical Plan ==
> Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS 
> b#17,split(value#20,,)[2] AS c#18]
>  HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to