[ 
https://issues.apache.org/jira/browse/SPARK-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10310:
------------------------------------

    Assignee: Apache Spark

> [Spark SQL] All result records will be popluated in ONE line due to missing 
> the correct line/filed separator
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10310
>                 URL: https://issues.apache.org/jira/browse/SPARK-10310
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yi Zhou
>            Assignee: Apache Spark
>            Priority: Blocker
>
> There is real case using python stream script in Spark SQL query. We found 
> that all result records from "select"  write in ONE line as input for python 
> script and so it cause script will not identify each record.Other, filed 
> separator in spark sql will be '^A' or '\001' which is 
> inconsistent/incompatible the '\t' in Hive implementation.
> #################Key  Query####################:
> CREATE VIEW temp1 AS
> SELECT *
> FROM
> (
>   FROM
>   (
>     SELECT
>       c.wcs_user_sk,
>       w.wp_type,
>       (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec
>     FROM web_clickstreams c, web_page w
>     WHERE c.wcs_web_page_sk = w.wp_web_page_sk
>     AND   c.wcs_web_page_sk IS NOT NULL
>     AND   c.wcs_user_sk     IS NOT NULL
>     AND   c.wcs_sales_sk    IS NULL --abandoned implies: no sale
>     DISTRIBUTE BY wcs_user_sk SORT BY wcs_user_sk, tstamp_inSec
>   ) clicksAnWebPageType
>   REDUCE
>     wcs_user_sk,
>     tstamp_inSec,
>     wp_type
>   USING 'python sessionize.py 3600'
>   AS (
>     wp_type STRING,
>     tstamp BIGINT, 
>     sessionid STRING)
> ) sessionized
> #############Key Python Script#################
> for line in sys.stdin:
>      user_sk,  tstamp_str, value  = line.strip().split("\t")
> ############Result Records example from 'select' ##################
> ^V31^A3237764860^Afeedback^U31^A3237769106^Adynamic^T31^A3237779027^Areview
> ############Result Records example in format######################
> 31   3237764860   feedback
> 31   3237769106   dynamic
> 31   3237779027   review



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to