An update on this old

I have been working through this in a real lab environment and can confirm
a few key points that may help others considering an upgrade to Hive 4.x.

I successfully upgraded to *Hive 4* running on *Hadoop 3.4.1 *with an *Oracle
12g *metastore. During the process, I encountered transactional and locking
issues, which at first appeared to be general Hive 4 instability. However,
after deeper investigation, I traced the problem to corrupted transactional
counter tables in the metastore (specifically NEXT_TXN_ID and NEXT_LOCK_ID).
Once these were corrected, the system behaved as expected. This suggests
that some upgrade issues may stem from metastore state inconsistencies
rather than Hive 4 itself.

On the Spark side, I found that Spark 3.5.5 (my current version)  is not
compatible with the Hive 4 metastore APIs when using the traditional
HiveExternalCatalog approach. This leads to errors such as Invalid method
name: get_table.

*To work around this, I adopted a decoupled architecture:*

   -

   Spark handles data ingestion and transformation, writing data to Parquet
   in HDFS (or any Hadoop Compatible File System such as S3, GCS, etc.).
   -

   Hive is used purely for metadata and querying, accessed via JDBC and
   external tables.

This approach has proven to be stable and practical, and it preserves the
ability to use Spark alongside Hive 4 without relying on direct metastore
integration.

In summary:

   -

   Hive 4 upgrade is feasible, but metastore transactional tables must be
   carefully validated.
   -

   Spark integration requires a different approach, as direct metastore
   compatibility is currently a limitation.
   -

   A decoupled model (Spark → Parquet → HCFS → Hive via external tables)
   works effectively.

HTH

Dr Mich Talebzadeh,
Data Scientist | Distributed Systems (Spark) | Financial Forensics &
Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
Analytics

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Mon, 9 Jun 2025 at 12:01, Mich Talebzadeh <[email protected]>
wrote:

> Thanks Angel for offer of your help
>
> I added some comments to this thread
>
> https://github.com/apache/iceberg/issues/2387
>
> The problem was observed with postgres DB issue. So I am not sure the
> cause is metastore on transactional DB or not. This error may not be
> relevant to other metastore for Hive but worth investigating it.
>
> cheers
>
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Sat, 7 Jun 2025 at 08:08, Ángel Álvarez Pascua <
> [email protected]> wrote:
>
>> I'm also interested in this SPIP.
>> There was someone else also working on this, if I remember correctly.
>>
>> @Mich Talebzadeh <[email protected]> , if you need any help with
>> that issue, let me know.
>>
>> El vie, 6 jun 2025, 1:07, Mich Talebzadeh <[email protected]>
>> escribió:
>>
>>> i started working on this by upgrading my hadoop to
>>>
>>> Hadoop 3.4.1
>>>
>>> My Hive is
>>>
>>> Driver: Hive JDBC (version 4.0.1)
>>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>>> Running init script /home/hduser/dba/bin/add_jars.hql
>>> 25/06/05 23:33:44 [main]: WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 0: jdbc:hive2://rhes75:10099/default> *set
>>> hive.support.concurrency=false;*
>>> No rows affected (0.027 seconds)
>>> 0: jdbc:hive2://rhes75:10099/default>
>>> 0: jdbc:hive2://rhes75:10099/default> Beeline version 4.0.1 by Apache
>>> Hive
>>>
>>> Now we hive transactional with Hive 4. which we did not have with the
>>> prior versions. Compounded my metastore is on Oracle 12.
>>> messing around with
>>> set hive.support.concurrency=false;
>>> show databases
>>> . . . . . . . . . . . . . . . . . . > Error: Error running query
>>> (state=,code=0)
>>>  *set hive.support.concurrency=true*
>>> . . . . . . . . . . . . . . . . . . > No rows affected (0.002 seconds)
>>> 0: jdbc:hive2://rhes75:10099/default> show databases
>>> . . . . . . . . . . . . . . . . . . > +----------------+
>>> | database_name  |
>>> +----------------+
>>> | access         |
>>> | accounts       |
>>>
>>> So the last time I worked on it I was trying to sort out the concurrency
>>> issues
>>>
>>> Running simple queries on Hive
>>>
>>> FAILED: Error in acquiring locks: Error communicating with the metastore
>>> ERROR : FAILED: Error in acquiring locks: Error communicating with the
>>> metastore
>>> org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating
>>> with the metastore
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:183)
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:475)
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:509)
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:296)
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.acquireLocks(HiveTxnManagerImpl.java:81)
>>>         at
>>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:101)
>>>         at
>>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocksInternal(DriverTxnHandler.java:328)
>>>         at
>>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocks(DriverTxnHandler.java:232)
>>>         at
>>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocksIfNeeded(DriverTxnHandler.java:144)
>>>         at
>>> org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:356)
>>>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:197)
>>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>>>         at
>>> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>>>         at
>>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>>>         at
>>> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>>>         at
>>> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
>>>         at
>>> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>>>         at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> Caused by: org.apache.thrift.TApplicationException: Internal error
>>> processing lock
>>>
>>> Unfortunately something is missing somewhere
>>>
>>> They have seen this error with postgres Hive metastore DB as well. I
>>> need to work on it when I have a chance
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, 5 Jun 2025 at 22:20, Rozov, Vlad <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I want to start a discussion thread on the SPIP titled "Upgrade Apache
>>>> Hive to 4.x” [JIRA <https://issues.apache.org/jira/browse/SPARK-52408>
>>>> ][Doc
>>>> <https://docs.google.com/document/d/1ejaGpuBvwBz2cD3Xj-QysShauBrdgYSh5yTxfAGvS1c/edit?usp=sharing>]
>>>> that I was researching for some time now.
>>>>
>>>> The SPIP proposes upgrading the Apache Hive version used in Apache
>>>> Spark builds from 2.3.10 to version 4.x. It also proposes discontinuing
>>>> support for Apache Hive 2.x and 3.x, as these versions are no longer
>>>> maintained by the Apache Hive community and have reached end-of-life (EOL).
>>>>
>>>> The key objectives of this proposal are to:
>>>>
>>>>
>>>>    - Maintain all existing functionality currently supported in Apache
>>>>    Hive 2.x that is compatible with Apache Hive 4.x
>>>>    - Ensure no functional or performance regressions occur
>>>>    - Provide the best upgrade path for current Apache Spark users,
>>>>    minimizing prerequisites and manual steps for those using Hive 2.x or 
>>>> 3.x
>>>>
>>>>
>>>> I'd greatly appreciate your feedback, thoughts, and suggestions on this
>>>> proposal!
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>

Reply via email to