Aman Raj created SOLR-17220:
-------------------------------
Summary: SolrZkClient should be a daemon-thread
Key: SOLR-17220
URL: https://issues.apache.org/jira/browse/SOLR-17220
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Aman Raj
I am submitting a SparkSql job through spark-submit. Spark version 3.3.1 and
Kyuubi version 1.8.0. I am using Open Source Spark Engine with Kyuubi Authz
module running on top of the Spark Driver in client mode. The Spark job is
successful, but the Spark Driver does not stop and keeps on running and I see
the PolicyRefresher keeps on polling policies from Ranger.
[!https://private-user-images.githubusercontent.com/104416558/317133070-5bf7e9af-24d3-4ffb-8239-53eae0bd88fc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTE2NDM0MDQsIm5iZiI6MTcxMTY0MzEwNCwicGF0aCI6Ii8xMDQ0MTY1NTgvMzE3MTMzMDcwLTViZjdlOWFmLTI0ZDMtNGZmYi04MjM5LTUzZWFlMGJkODhmYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwMzI4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDMyOFQxNjI1MDRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00NjJjY2RhOGRhZjU1M2Q3YmM2Y2Q0MzRmYWEzOTlkMGU1ODkyNTkzYzMyN2ZlZTBlMGRiOTI4MmQzNmJmYWE1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.8mUQhNXUL9KOEoXWJhw3vI8eUaCZDFvjwgRzvqwlvEY!|https://private-user-images.githubusercontent.com/104416558/317133070-5bf7e9af-24d3-4ffb-8239-53eae0bd88fc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTE2NDM0MDQsIm5iZiI6MTcxMTY0MzEwNCwicGF0aCI6Ii8xMDQ0MTY1NTgvMzE3MTMzMDcwLTViZjdlOWFmLTI0ZDMtNGZmYi04MjM5LTUzZWFlMGJkODhmYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwMzI4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDMyOFQxNjI1MDRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00NjJjY2RhOGRhZjU1M2Q3YmM2Y2Q0MzRmYWEzOTlkMGU1ODkyNTkzYzMyN2ZlZTBlMGRiOTI4MmQzNmJmYWE1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.8mUQhNXUL9KOEoXWJhw3vI8eUaCZDFvjwgRzvqwlvEY]
If you see the logs, PolicyRefresher is still running even after Spark Context
has stopped. This is leading to the Spark Driver not ending and therefore after
sometime I have to manually kill the job.
The issue is that the SolrZkClient used for logging to Solr currently opens up
a non-daemon thread on top of Spark Driver. When Spark Driver is stuck I
collected the jstack of non-daemon threads as follows:
{{"zkConnectionManagerCallback-5-thread-1" #202 prio=5 os_prio=0
tid=0x00007f1cbc003000 nid=0x4ca7 waiting on condition [0x00007f1bc3a01000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007b218dad8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
"DestroyJavaVM" #453 prio=5 os_prio=0 tid=0x00007f1dfc019000 nid=0x4ab5 waiting
on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=0 tid=0x00007f1dfc09a000 nid=0x4ac3 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f1dfc10e800 nid=0x4ad4 waiting
on condition}}
So when Spark Context stops, SolrZkClient does not let Spark Driver to exit
since currently it is of type non-daemon as shown below :
!https://private-user-images.githubusercontent.com/104416558/317615386-a0c14b53-4725-46c7-b762-658a838606c8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTE2NDM0MDUsIm5iZiI6MTcxMTY0MzEwNSwicGF0aCI6Ii8xMDQ0MTY1NTgvMzE3NjE1Mzg2LWEwYzE0YjUzLTQ3MjUtNDZjNy1iNzYyLTY1OGE4Mzg2MDZjOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwMzI4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDMyOFQxNjI1MDVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kYTMzZWE1YzhmZmI2M2ViNzIxNTU1YTVjNWE1YzZhNWUzOWJjMDVmNGU0YWEzMWRiYzY2OWFhYWU0NTFhZjlmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.vpKICuUHju3g8U48B4O_8iUDBgjvZaijOS0FoDRPEtY!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]