[
https://issues.apache.org/jira/browse/HIVE-28808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933641#comment-17933641
]
Quanlong Huang edited comment on HIVE-28808 at 3/10/25 3:16 AM:
----------------------------------------------------------------
The current code fetch a batch of old events and delete them using
PersistenceManager.deletePersistentAll():
https://github.com/apache/hive/blob/56a18bbba94f7cc099cb8dd1ab5e243a77fecd3f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L11603
An alternative is using a DELETE query directly. Uploaded a PR for review:
https://github.com/apache/hive/pull/5688
BTW, the OOM issue in this JIRA can be workarounded by setting
hive.metastore.event.db.clean.maxevents to a smaller value. The default is
10000. I tried 100 and it works in my env.
was (Author: stiga-huang):
The current code fetch a batch of old events and delete them using
PersistenceManager.deletePersistentAll():
https://github.com/apache/hive/blob/56a18bbba94f7cc099cb8dd1ab5e243a77fecd3f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L11603
An alternative is using a DELETE FROM query directly. However, that can't do
the job in batches. In the case when there are lots of old events, this might
impact inserting new events.
The issue in this JIRA can be workarounded by setting
hive.metastore.event.db.clean.maxevents to smaller values. The default is
10000. I tried 100 and it works in my env.
> DB-Notification-Cleaner thread dies in startup due to OOM
> ---------------------------------------------------------
>
> Key: HIVE-28808
> URL: https://issues.apache.org/jira/browse/HIVE-28808
> Project: Hive
> Issue Type: Bug
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Labels: pull-request-available
>
> Saw this when launching HMS on a huge NOTIFICATION_LOG table.
> {noformat}
> 2025-03-07 19:37:18: Starting Hive Metastore Server
> Listening for transport dt_socket at address: 30010
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-58457853/apache-hive-3.1.3000.7.3.1.0-160-bin/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-58457853/hadoop-3.1.1.7.3.1.0-160/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "DB-Notification-Cleaner" java.lang.OutOfMemoryError:
> Java heap space
> at java.lang.StringCoding.decode(StringCoding.java:215)
> at java.lang.String.<init>(String.java:463)
> at org.postgresql.core.Encoding.decode(Encoding.java:284)
> at org.postgresql.core.Encoding.decode(Encoding.java:295)
> at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2256)
> at
> com.zaxxer.hikari.pool.HikariProxyResultSet.getString(HikariProxyResultSet.java)
> at
> org.datanucleus.store.rdbms.mapping.column.LongVarcharColumnMapping.getString(LongVarcharColumnMapping.java:102)
> at
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.getString(SingleFieldMapping.java:188)
> at
> org.datanucleus.store.rdbms.fieldmanager.ResultSetGetter.fetchStringField(ResultSetGetter.java:133)
> at
> org.datanucleus.state.StateManagerImpl.replacingStringField(StateManagerImpl.java:1986)
> at
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceField(MNotificationLog.java)
> at
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceFields(MNotificationLog.java)
> at
> org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4352)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF$1.fetchFields(PersistentClassROF.java:528)
> at
> org.datanucleus.state.StateManagerImpl.loadFieldValues(StateManagerImpl.java:3743)
> at
> org.datanucleus.state.StateManagerImpl.initialiseForHollow(StateManagerImpl.java:383)
> at
> org.datanucleus.state.ObjectProviderFactoryImpl.newForHollow(ObjectProviderFactoryImpl.java:99)
> at
> org.datanucleus.ExecutionContextImpl.findObject(ExecutionContextImpl.java:3199)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF.findObjectWithIdAndLoadFields(PersistentClassROF.java:523)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF.getObject(PersistentClassROF.java:456)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:409)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.processNumberOfResults(ForwardQueryResult.java:137)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.advanceToEndOfResultSet(ForwardQueryResult.java:165)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.getSizeUsingMethod(ForwardQueryResult.java:519)
> at
> org.datanucleus.store.query.AbstractQueryResult.size(AbstractQueryResult.java:256)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:12213)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.cleanOlderEvents(ObjectStore.java:12175)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:12161)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}
> This is a downstream build. The related hive code is
> https://github.com/apache/hive/blob/d0372808177a823d63383e311c5909aa46b9a961/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L11577
> It seems we should optimize the code of cleaning old events.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)