[jira] [Updated] (FLINK-28120) Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]

luoyuxia (Jira) Sun, 19 Jun 2022 18:58:09 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


luoyuxia updated FLINK-28120:
-----------------------------
    Description: 
When I run the following sql with Hive dialect,

 
{code:java}
create table src(key string, value string);

SELECT key, value FROM
(
  SELECT key, value FROM src
  UNION ALL
  SELECT key, key as value FROM ( 
    SELECT distinct key FROM (
      SELECT key, value FROM (
        SELECT key, value FROM src
        UNION ALL
        SELECT key, value FROM src
      )t1 
    group by key, value)t2
  )t3
)t4
group by key, value {code}
 

 

it'll throw the excpetion 

 
{code:java}
Caused by: java.lang.AssertionError: rel 
[rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, 
1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost 
{8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost 
{8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset 
[rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code}
And then I check the Flink code in where it's thrown, I find it's in 

 
{code:java}
if (relCost.isLt(subset.bestCost)) {
  return litmus.fail("rel [{}] has lower cost {} than "
          + "best cost {} of subset [{}]",
          rel, relCost, subset.bestCost, subset);
} {code}
It seems the relCost is less than best cost, so the excpetion throw.

But the relCost is actually greater than the best cost, shown as follows:

!截屏2022-06-18 上午11.48.46.png|width=391,height=268!

 

It seems the logic in Flink cost comparison breaks.

Then, I find the method #isLt in FlinkCost, which depend on #isLe and #equals. 
But #isLe  use normalizeCost, #equals doesn't use normalizeCost, which bring 
such incosistent.

For such case, the normalizeCost if  relCost and bestCost will be same. Althogh 
the network isn't same,  they will end with be same when calculated as a 
normalizeCost, which seems like precison loss in double.

So #isLe will be true, but in method #equals, it will compare io, nework, 
memory separately, which result in false. Then #isLt  = #isLe(other) && 
!#equals(other) will be true, which bring such exceptioin.

To fix it, I think we should change the logic for #equals to make it consistent 
with what we use to compare in #isLe.

 

 

> Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then  
> best cost  of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28120
>                 URL: https://issues.apache.org/jira/browse/FLINK-28120
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: luoyuxia
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: 截屏2022-06-18 上午11.48.46.png
>
>
> When I run the following sql with Hive dialect,
>  
> {code:java}
> create table src(key string, value string);
> SELECT key, value FROM
> (
>   SELECT key, value FROM src
>   UNION ALL
>   SELECT key, key as value FROM ( 
>     SELECT distinct key FROM (
>       SELECT key, value FROM (
>         SELECT key, value FROM src
>         UNION ALL
>         SELECT key, value FROM src
>       )t1 
>     group by key, value)t2
>   )t3
> )t4
> group by key, value {code}
>  
>  
> it'll throw the excpetion 
>  
> {code:java}
> Caused by: java.lang.AssertionError: rel 
> [rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, 
> 1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost 
> {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
> 3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost 
> {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
> 3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset 
> [rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code}
> And then I check the Flink code in where it's thrown, I find it's in 
>  
> {code:java}
> if (relCost.isLt(subset.bestCost)) {
>   return litmus.fail("rel [{}] has lower cost {} than "
>           + "best cost {} of subset [{}]",
>           rel, relCost, subset.bestCost, subset);
> } {code}
> It seems the relCost is less than best cost, so the excpetion throw.
> But the relCost is actually greater than the best cost, shown as follows:
> !截屏2022-06-18 上午11.48.46.png|width=391,height=268!
>  
> It seems the logic in Flink cost comparison breaks.
> Then, I find the method #isLt in FlinkCost, which depend on #isLe and 
> #equals. But #isLe  use normalizeCost, #equals doesn't use normalizeCost, 
> which bring such incosistent.
> For such case, the normalizeCost if  relCost and bestCost will be same. 
> Althogh the network isn't same,  they will end with be same when calculated 
> as a normalizeCost, which seems like precison loss in double.
> So #isLe will be true, but in method #equals, it will compare io, nework, 
> memory separately, which result in false. Then #isLt  = #isLe(other) && 
> !#equals(other) will be true, which bring such exceptioin.
> To fix it, I think we should change the logic for #equals to make it 
> consistent with what we use to compare in #isLe.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28120) Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]

Reply via email to