RE: [jira] [Created] (FLINK-5722) Implement DISTINCT as dedicated operator

Radu Tudoran Mon, 06 Feb 2017 06:08:59 -0800

Hi,

Should we discuss also about the design of distinct for the stream case?
It might go well in the context of tables as well as in the context of 
aggregates over windows...


Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R&D Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!


-----Original Message-----
From: Fabian Hueske (JIRA) [mailto:j...@apache.org] 
Sent: Monday, February 06, 2017 2:56 PM
To: dev@flink.apache.org
Subject: [jira] [Created] (FLINK-5722) Implement DISTINCT as dedicated operator

Fabian Hueske created FLINK-5722:
------------------------------------

             Summary: Implement DISTINCT as dedicated operator
                 Key: FLINK-5722
                 URL: https://issues.apache.org/jira/browse/FLINK-5722
             Project: Flink
          Issue Type: Improvement
          Components: Table API & SQL
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Fabian Hueske


DISTINCT is currently implemented for batch Table API / SQL as an aggregate 
which groups on all fields. Grouped aggregates are implemented as GroupReduce 
with sort-based combiner.

This operator can be more efficiently implemented by using ReduceFunction and 
hinting a HashCombine strategy. The same ReduceFunction can be used for all 
DISTINCT operations and can be assigned with appropriate forward field 
annotations.

We would need a custom conversion rule which translates distinct aggregations 
(grouping on all fields and returning all fields) into a custom DataSetRelNode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

RE: [jira] [Created] (FLINK-5722) Implement DISTINCT as dedicated operator

Reply via email to