[
https://issues.apache.org/jira/browse/KYLIN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nichunen updated KYLIN-4246:
----------------------------
Fix Version/s: v3.0.0
> Wrong results from real-time streaming when an optional field is used as a
> dimension
> ------------------------------------------------------------------------------------
>
> Key: KYLIN-4246
> URL: https://issues.apache.org/jira/browse/KYLIN-4246
> Project: Kylin
> Issue Type: Bug
> Components: Real-time Streaming
> Affects Versions: v3.0.0-alpha
> Reporter: Andras Istvan Nagy
> Priority: Critical
> Fix For: v3.0.0
>
>
> We run into an issue which seems to be related to the real-time streaming
> receiver.
> We have an optional field in the kafka messages, that is, in some cases it
> has a value, in other cases it is missing from the JSON message. This field
> is defined as a dimension and is used in the queries. We were assuming that
> when this field is missing from the JSON message, its value will be
> interpreted as null (as is the case with the Kylin batch engine).
> The results for queries that include this field are correct for those
> segments that have been rebuilt with the Kylin batch engine, but incorrect
> when they are built by the streaming receiver.
> E.g. in the example query below (simplified version of our actual queries),
> optional_field is an optional field, and mandatory_field always has a value.
> In our case, the below query should produce 0 as output, because we have no
> such records where optional_field has a value (is not null), and
> mandatory_field='X'. Still, for those segments that are coming from the
> streaming receiver, we get non-0 values.
> {{select count( * )}}
> {{from movement_events}}
> {{where mandatory_field='X'}}
> {{and optional_field is not null;}}
> Is this a known issue? Can we avoid this somehow, without changing the source
> events?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)