[ 
https://issues.apache.org/jira/browse/KYLIN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Istvan Nagy updated KYLIN-4246:
--------------------------------------
    Description: 
We run into an issue which seems to be related to the real-time streaming 
receiver.

We have an optional field in the kafka messages, that is, in some cases it has 
a value, in other cases it is missing from the JSON message. This field is 
defined as a dimension and is used in the queries. We were assuming that when 
this field is missing from the JSON message, its value will be interpreted as 
null (as is the case with the Kylin batch engine).

The results for queries that include this field are correct for those segments 
that have been rebuilt with the Kylin batch engine, but incorrect when they are 
built by the streaming receiver.

E.g. in the example query below (simplified version of our actual queries), 
optional_field is an optional field, and mandatory_field always has a value. In 
our case, the below query should produce 0 as output, because we have no such 
records where optional_field has a value (is not null), and 
mandatory_field='X'. Still, for those segments that are coming from the 
streaming receiver, we get non-0 values.

{{select count( * )}}
 {{from movement_events}}
 {{where mandatory_field='X'}}
 {{and optional_field is not null;}}

Is this a known issue? Can we avoid this somehow, without changing the source 
events?

 

  was:
We run into an issue which seems to be related to the real-time streaming 
receiver.

We have an optional field in the kafka messages, that is, in some cases it has 
a value, in other cases it is missing from the JSON message. This field is 
defined as a dimension and is used in the queries. We were assuming that when 
this field is missing from the JSON message, its value will be interpreted as 
null (as is the case with the Kylin batch engine).

The results for queries that include this field are correct for those segments 
that have been rebuilt with the Kylin batch engine, but incorrect when they are 
built by the streaming receiver.

E.g. in the example query below (simplified version of our actual queries), 
optional_field is an optional field, and mandatory_field always has a value. In 
our case, the below query should produce 0 as output, because we have no such 
records where optional_field has a value (is not null), and 
mandatory_field='X'. Still, for those segments that are coming from the 
streaming receiver, we get non-0 values.

{{select count(*)}}
{{from movement_events}}
{{where mandatory_field='X'}}
{{and optional_field is not null;}}

Is this a known issue? Can we avoid this somehow, without changing the source 
events?

 


> Wrong results from real-time streaming when an optional field is used as a 
> dimension
> ------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4246
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4246
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.0.0-alpha
>            Reporter: Andras Istvan Nagy
>            Priority: Critical
>
> We run into an issue which seems to be related to the real-time streaming 
> receiver.
> We have an optional field in the kafka messages, that is, in some cases it 
> has a value, in other cases it is missing from the JSON message. This field 
> is defined as a dimension and is used in the queries. We were assuming that 
> when this field is missing from the JSON message, its value will be 
> interpreted as null (as is the case with the Kylin batch engine).
> The results for queries that include this field are correct for those 
> segments that have been rebuilt with the Kylin batch engine, but incorrect 
> when they are built by the streaming receiver.
> E.g. in the example query below (simplified version of our actual queries), 
> optional_field is an optional field, and mandatory_field always has a value. 
> In our case, the below query should produce 0 as output, because we have no 
> such records where optional_field has a value (is not null), and 
> mandatory_field='X'. Still, for those segments that are coming from the 
> streaming receiver, we get non-0 values.
> {{select count( * )}}
>  {{from movement_events}}
>  {{where mandatory_field='X'}}
>  {{and optional_field is not null;}}
> Is this a known issue? Can we avoid this somehow, without changing the source 
> events?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to