TsReaper opened a new pull request #13764:
URL: https://github.com/apache/flink/pull/13764


   ## What is the purpose of the change
   
   Due to CALCITE-4351 `FlinkRelMdDistinctRowCount#getDistinctRowCount(Calc)` 
will always return 0 when number of rows are large.
   
   This PR introduces our own `FlinkRelMdUtil#numDistinctVals` to treat small 
and large inputs in different ways. For small inputs we use the more precise 
`RelMdUtil#numDistinctVals` and for large inputs we copy the old, approximated 
implementation of `RelMdUtil#numDistinctVals`.
   
   This is a temporary solution. When CALCITE-4351 is fixed we should revert 
this commit.
   
   ## Brief change log
   
    - Introduce `FlinkRelMdUtil#numDistinctVals`
   
   ## Verifying this change
   
   This change is already covered by existing tests, also this change added 
tests and can be verified by running the added tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to