Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13512 `Aggregator` API is designed for typed `Dataset` only, not for untyped `DataFrame`. It can work if users use `Row` as the input type of `Aggregator`, but it's not a recommended usage. On the other way, it's dangerous to use a different encoder when apply a typed operation on `Dataset`. Let's say we have a `Dataset[T]` called `ds`, and its data is encoded from instances of `T` by `enc1`. Now you apply an `Aggregator` on the `ds`, with input encoder `enc2`. We can not guarantee that `enc2` can decode the data of `ds` to `T` instances, because the data is encoded by `enc1`. We can discuss it further if you have other ideas, and close this PR if you think my explaintion makes sense, thanks for working on it anyway!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org