Hi Ayush,

I am very happy to see that the community has started working on Hive 4.5.
It is essential for the community to clearly see how Hive continues to
evolve over time.
I have a 4.5 suggestions that I think would be valuable to consider for the
4.5 release:


   - #1 As you mentioned, Iceberg v3 is a major part of this release. I
   fully agree, and I think we should clearly highlight that Hive is one of
   the core engines supporting Iceberg v3. Potentially even earlier than Trino
   or other competitors. One thing I would like to put attention to (coming
   from discussions with the Apache Impala team) is that the Vector Delete
   spec seems to have changed, with row-lineage becoming a prerequisite. As
   far as I remember, this is not yet implemented in Hive. If we want Hive to
   officially support Iceberg v3 with vector deletes, we should verify and
   address this gap. https://iceberg.apache.org/spec/#row-lineage
   <https://iceberg.apache.org/spec/#row-lineage>
   - #2 Recently, Abstractdog (Laszlo) raised several ideas around
   potential performance improvements in LLAP and Tez. I think these are worth
   evaluating. Some may be configuration-level changes, while others could
   require architectural updates with TEZ.
   - #3 Building on the above, I believe the community is missing clear
   performance numbers. It would be great to introduce some form of
   standardized metrics that we can publish with every release. Showing how
   Hive performs on a predefined dataset. I understand this would introduce
   additional testing and infra costs, but we could discuss internally how to
   manage these challenges. I am also happy to talk with Cloudera about
   potentially supporting this initiative.
   - #4 Hadoop 3.5 support would be great. Do we plan to include a newer
   Tez version in 4.5? From what I can see, a significant number of changes
   have recently landed in the repository.
   - #4.5 CVEs for 4.5: especially for new contributors, we should
   encourage updating dependencies to the latest versions where possible.
   Since Hive is used by large government agencies, these enterprise
   requirements should be reflected upstream as well. It is acceptable if the
   final number is zero, but making this part of the process would be a great
   baseline task. Additionally, we could consider improving Dependabot usage
   or exploring newer AI-based tools to help with security improvements. (I
   know, I know it is basic)


-Attila

On Mon, Jan 19, 2026 at 10:12 AM Ayush Saxena <[email protected]> wrote:

> Hi folks,
> With the 4.2.0 release now behind us, I think this is a good time to
> start discussions around the next release. I wanted to start a thread
> where we can begin planning timelines and parking items that we would
> like to include.
>
> Based on some offline discussions, one initial thought is to target
> the next release roughly three months from now. This is, of course,
> open for discussion.
>
> Some major items that could be considered for the next release:
>
> Iceberg V3–related changes, the new data types and improvements around
> the existing stuff.
>
> Upgrading to newer Tez and Hadoop versions
>
> If Hadoop 3.5 is released by then, we should definitely chase that, it
> would be the official version supporting JDK 17
>
> Please feel free to suggest additional features, fixes, or follow-ups
> from previous releases that you think should be included. It would
> also be good to call out any regressions or issues that need
> attention.
>
> Looking forward to hearing your thoughts.
>
>
> -Ayush
>

Reply via email to