[
https://issues.apache.org/jira/browse/SPARK-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yi Zhou updated SPARK-10310:
----------------------------
Summary: [Spark SQL] All result records will be popluated in ONE line due
to missing the correct line/filed delimeter (was: [Spark SQL] All result
records will be popluated in ONE line due to missing the correct line/filed
separator)
> [Spark SQL] All result records will be popluated in ONE line due to missing
> the correct line/filed delimeter
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-10310
> URL: https://issues.apache.org/jira/browse/SPARK-10310
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Yi Zhou
> Priority: Blocker
>
> There is real case using python stream script in Spark SQL query. We found
> that all result records from "select" write in ONE line as input for python
> script and so it cause script will not identify each record.Other, filed
> separator in spark sql will be '^A' or '\001' which is
> inconsistent/incompatible the '\t' in Hive implementation.
> #################Key Query####################:
> CREATE VIEW temp1 AS
> SELECT *
> FROM
> (
> FROM
> (
> SELECT
> c.wcs_user_sk,
> w.wp_type,
> (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec
> FROM web_clickstreams c, web_page w
> WHERE c.wcs_web_page_sk = w.wp_web_page_sk
> AND c.wcs_web_page_sk IS NOT NULL
> AND c.wcs_user_sk IS NOT NULL
> AND c.wcs_sales_sk IS NULL --abandoned implies: no sale
> DISTRIBUTE BY wcs_user_sk SORT BY wcs_user_sk, tstamp_inSec
> ) clicksAnWebPageType
> REDUCE
> wcs_user_sk,
> tstamp_inSec,
> wp_type
> USING 'python sessionize.py 3600'
> AS (
> wp_type STRING,
> tstamp BIGINT,
> sessionid STRING)
> ) sessionized
> #############Key Python Script#################
> for line in sys.stdin:
> user_sk, tstamp_str, value = line.strip().split("\t")
> ############Result Records example from 'select' ##################
> ^V31^A3237764860^Afeedback^U31^A3237769106^Adynamic^T31^A3237779027^Areview
> ############Result Records example in format######################
> 31 3237764860 feedback
> 31 3237769106 dynamic
> 31 3237779027 review
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]