Hi Ayush, I am very happy to see that the community has started working on Hive 4.5. It is essential for the community to clearly see how Hive continues to evolve over time. I have a 4.5 suggestions that I think would be valuable to consider for the 4.5 release:
- #1 As you mentioned, Iceberg v3 is a major part of this release. I fully agree, and I think we should clearly highlight that Hive is one of the core engines supporting Iceberg v3. Potentially even earlier than Trino or other competitors. One thing I would like to put attention to (coming from discussions with the Apache Impala team) is that the Vector Delete spec seems to have changed, with row-lineage becoming a prerequisite. As far as I remember, this is not yet implemented in Hive. If we want Hive to officially support Iceberg v3 with vector deletes, we should verify and address this gap. https://iceberg.apache.org/spec/#row-lineage <https://iceberg.apache.org/spec/#row-lineage> - #2 Recently, Abstractdog (Laszlo) raised several ideas around potential performance improvements in LLAP and Tez. I think these are worth evaluating. Some may be configuration-level changes, while others could require architectural updates with TEZ. - #3 Building on the above, I believe the community is missing clear performance numbers. It would be great to introduce some form of standardized metrics that we can publish with every release. Showing how Hive performs on a predefined dataset. I understand this would introduce additional testing and infra costs, but we could discuss internally how to manage these challenges. I am also happy to talk with Cloudera about potentially supporting this initiative. - #4 Hadoop 3.5 support would be great. Do we plan to include a newer Tez version in 4.5? From what I can see, a significant number of changes have recently landed in the repository. - #4.5 CVEs for 4.5: especially for new contributors, we should encourage updating dependencies to the latest versions where possible. Since Hive is used by large government agencies, these enterprise requirements should be reflected upstream as well. It is acceptable if the final number is zero, but making this part of the process would be a great baseline task. Additionally, we could consider improving Dependabot usage or exploring newer AI-based tools to help with security improvements. (I know, I know it is basic) -Attila On Mon, Jan 19, 2026 at 10:12 AM Ayush Saxena <[email protected]> wrote: > Hi folks, > With the 4.2.0 release now behind us, I think this is a good time to > start discussions around the next release. I wanted to start a thread > where we can begin planning timelines and parking items that we would > like to include. > > Based on some offline discussions, one initial thought is to target > the next release roughly three months from now. This is, of course, > open for discussion. > > Some major items that could be considered for the next release: > > Iceberg V3–related changes, the new data types and improvements around > the existing stuff. > > Upgrading to newer Tez and Hadoop versions > > If Hadoop 3.5 is released by then, we should definitely chase that, it > would be the official version supporting JDK 17 > > Please feel free to suggest additional features, fixes, or follow-ups > from previous releases that you think should be included. It would > also be good to call out any regressions or issues that need > attention. > > Looking forward to hearing your thoughts. > > > -Ayush >
