I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very
next release (which I assume will be Druid 27) but the one after that,
likely late 2023 timeframe.

In 2021, we had a discussion about moving away from Hadoop 2:
https://lists.apache.org/thread/zmc389trnkh6x444so8mdb2h0x0noqq4. For
various reasons, it didn't seem like the right time. However, I believe now
is the right time:

1) We didn't support Hadoop 3 in 2021, but we support it now. There is now
a Hadoop 3 build profile, as well as convenience binaries on
https://druid.apache.org/downloads.html.

2) We have SQL-based ingest with MSQ tasks, which provides a built-in /
scalable / robust alternative to using Hadoop at all.

3) It has been an additional two years. Hadoop 2 is that much older, that
much more time has passed since it was superseded by Hadoop 3, and people
have had that much more time to migrate.

4) The original main reason for wanting to move away from Hadoop 2 is still
relevant. It keeps us on various old dependencies, including an ancient
version of Guava, which in turn has been keeping us on an ancient version
of Calcite. The Calcite community has graciously decided to support this
old version of Guava for at least one release, but plans to drop support by
Calcite 1.36, leaving us back in the same position. Managing this situation
is time-consuming for both Druid and Calcite maintainers.

5) Other solutions beyond dropping Hadoop 2 support were proposed in 2021,
such as reworking Hadoop support to be purely extension based, and
reworking extensions to be more isolated from each other. However, these
are both substantially more complex than dropping support, and in the two
years since the original thread, these more complex solutions have not been
implemented. So, I think we need to move on with the simpler solution of
dropping support.

Gian

Reply via email to