This makes sense, I have some questions about method names.

What do you think about renaming `dropDuplicates` to `deduplicate`? I don't
think that drop is the right word to use for this operation, it implies
records are filtered where this operator actually issues updates and
retractions. Also, deduplicate is already how we talk about this feature in
the docs so I think it would be easier for users to find.

For null handling, I don't know how close we want to stick with SQL
conventions but what about making `coalesce` a top-level method? Something
like:

myTable.coalesce($("a"), 1).as("a")

We can require the next method to be an `as`. There is already precedent
for this sort of thing, `GroupedTable#aggregate` can only be followed by
`select`.

Seth

On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com> wrote:

> Hi Dian,
>
> Big +1 for making the Table API easier to use. Java users and Python users
> can both benefit from it. I think it would be better if we add some Python
> API examples.
>
> Best,
> Wei
>
>
> > 在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com> 写道:
> >
> > Hi all,
> >
> > I'd like to start a discussion about introducing a few convenient
> operations in Table API from the perspective of ease of use.
> >
> > Currently some tasks are not easy to express in Table API e.g.
> deduplication, topn, etc, or not easy to express when there are hundreds of
> columns in a table, e.g. null data handling, etc.
> >
> > I'd like to propose to introduce a few operations in Table API with the
> following purposes:
> > - Make Table API users to easily leverage the powerful features already
> in SQL, e.g. deduplication, topn, etc
> > - Provide some convenient operations, e.g. introducing a series of
> operations for null data handling (it may become a problem when there are
> hundreds of columns), data sampling and splitting (which is a very common
> use case in ML which usually needs to split a table into multiple tables
> for training and validation separately).
> >
> > Please refer to FLIP-155 [1] for more details.
> >
> > Looking forward to your feedback!
> >
> > Regards,
> > Dian
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API
>
>

Reply via email to