[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

Xu Mingmin (JIRA) Tue, 09 Oct 2018 10:22:39 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643769#comment-16643769
 ]


Xu Mingmin commented on BEAM-5690:
----------------------------------

Is this the error specifically? Seems duplicated {{0}} counts here,
{code:java}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
+0000","WET":"2018-10-09  09-56-00 0}{code}

> Issue with GroupByKey in BeamSql using SparkRunner
> --------------------------------------------------
>
>                 Key: BEAM-5690
>                 URL: https://issues.apache.org/jira/browse/BEAM-5690
>             Project: Beam
>          Issue Type: Task
>          Components: runner-spark
>            Reporter: Kenneth Knowles
>            Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>    KafkaSource (KafkaIO) 
>        ---> Windowing (FixedWindow 1min)
>        ---> BeamSql
>        ---> KafkaSink (KafkaIO)
>                          
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>              
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  
> +0000","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

Reply via email to