NeQuissimus opened a new issue, #7034:
URL: https://github.com/apache/iceberg/issues/7034

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Hive
   
   ### Please describe the bug 🐞
   
   # Context
   
   For context, this conversation started on Slack but we decided to move it 
here for better visibility.
   cc @SinghAsDev 
   
   I believe this to be related to https://github.com/apache/iceberg/pull/5036
   
   # Code
   
   We have code using the API directly, no Spark, no Trickle etc.
   
   The code looks a little something like this and is driven by a Kafka 
consumer:
   
   ```scala
   
   // This is an object we keep around
   def catalog: HiveCatalog
   
   // for each message from Kafka, do the following
   val table: Table = catalog.loadTable()
   val transaction: Transaction = table.newTransaction()
   val append = transaction.newFastAppend()
   
   // add all data files from the Kafka message to `append`
   ...
   
   transaction.commitTransaction()
   ```
   
   # Observations
   
   With Iceberg 0.14 (and 1.0.0, but we have not tested this extensively; 
enough to say the issue is not present there), we have a pretty steady state of 
threads running. Fetching a heap dump gives us maybe 150 threads across 
everything inside our application.
   
   Once we update to Iceberg 1.1.0, we not only see the number of threads 
steadily increasing but also increasing with no obvious bound. (After a few 
hours, we see about 30,000 extra threads :D)
   All of these threads are named `iceberg-hive-lock-heartbeat-0`, which is why 
I was looking at #5036 immediately and it is also a new change in Iceberg 1.1.0.
   
   
   My understanding is that the `Transaction` essentially relates back to 
`HiveTableOperations.doCommit`.
   I do not see anything in `HiveTableOperations` shutting down the scheduler 
for the Hive pings. But I am not sure whether that would even be necessary.
   There are no new `close()` methods I could find on any of the objects we 
create either.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to