[jira] [Commented] (FLINK-14567) Aggregate query with more than two group fields can't be write into HBase sink

Kurt Young (Jira) Sun, 17 Nov 2019 21:41:17 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976296#comment-16976296
 ]


Kurt Young commented on FLINK-14567:
------------------------------------

It depends on how we see this problem. IMO if we want to emit the output stream 
with upsert fashion, is required for framework to convert all outputs into an 
unpsert stream bases on some keys. It's not different with requiring hash 
distribution for some operator like HashJoin. Adding an operator to convert it 
is actually the baseline for this situation, just like we will add a keyBy 
shuffle when hash distribution is required. But in some cases we can do some 
optimizations, like if we can derive the primary key information of the query, 
and we can optimize the plan to get rid of the operator. Again, just the same 
as hash distribution, if we can derive this and can reduce the effort to add a 
dedicated keyBy shuffle. 

Back to this case, it looks to me there are some certain cases where we can't 
apply the optimization, that's all. 

> Aggregate query with more than two group fields can't be write into HBase sink
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-14567
>                 URL: https://issues.apache.org/jira/browse/FLINK-14567
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / HBase, Table SQL / Legacy Planner, Table 
> SQL / Planner
>            Reporter: Jark Wu
>            Priority: Critical
>             Fix For: 1.10.0
>
>
> If we have a hbase table sink with rowkey of varchar (also primary key) and a 
> column of bigint, we want to write the result of the following query into the 
> sink using upsert mode. However, it will fail when primary key check with the 
> exception "UpsertStreamTableSink requires that Table has a full primary keys 
> if it is updated."
> {code:sql}
> select concat(f0, '-', f1) as key, sum(f2)
> from T1
> group by f0, f1
> {code}
> This happens in both blink planner and old planner. That is because if the 
> query works in update mode, then there must be a primary key exist to be 
> extracted and set to {{UpsertStreamTableSink#setKeyFields}}. 
> That's why we want to derive primary key for concat in FLINK-14539, however, 
> we found that the primary key is not preserved after concating. For example, 
> if we have a primary key (f0, f1, f2) which are all varchar type, say we have 
> two unique records ('a', 'b', 'c') and ('ab', '', 'c'), but the results of 
> concat(f0, f1, f2) are the same, which means the concat result is not primary 
> key anymore.
> So here comes the problem, how can we proper support HBase sink or such use 
> case? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14567) Aggregate query with more than two group fields can't be write into HBase sink

Reply via email to