No, the driver memory was not set explicitly. So it was likely the default value, which appears to be 1GB.
On Thu, Aug 17, 2023, 16:49 Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > One question, what was the driver memory before setting it to 4G? Did you > have it set at all before? > > HTH > > Mich Talebzadeh, > Solutions Architect/Engineering Lead > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 17 Aug 2023 at 21:01, Patrick Tucci <patrick.tu...@gmail.com> > wrote: > >> Hi Mich, >> >> Here are my config values from spark-defaults.conf: >> >> spark.eventLog.enabled true >> spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs >> spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider >> spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs >> spark.history.fs.update.interval 10s >> spark.history.ui.port 18080 >> spark.sql.warehouse.dir hdfs://10.0.50.1:8020/user/spark/warehouse >> spark.executor.cores 4 >> spark.executor.memory 16000M >> spark.sql.legacy.createHiveTableByDefault false >> spark.driver.host 10.0.50.1 >> spark.scheduler.mode FAIR >> spark.driver.memory 4g #added 2023-08-17 >> >> The only application that runs on the cluster is the Spark Thrift server, >> which I launch like so: >> >> ~/spark/sbin/start-thriftserver.sh --master spark://10.0.50.1:7077 >> >> The cluster runs in standalone mode and does not use Yarn for resource >> management. As a result, the Spark Thrift server acquires all available >> cluster resources when it starts. This is okay; as of right now, I am the >> only user of the cluster. If I add more users, they will also be SQL users, >> submitting queries through the Thrift server. >> >> Let me know if you have any other questions or thoughts. >> >> Thanks, >> >> Patrick >> >> On Thu, Aug 17, 2023 at 3:09 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hello Paatrick, >>> >>> As a matter of interest what parameters and their respective values do >>> you use in spark-submit. I assume it is running in YARN mode. >>> >>> HTH >>> >>> Mich Talebzadeh, >>> Solutions Architect/Engineering Lead >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Thu, 17 Aug 2023 at 19:36, Patrick Tucci <patrick.tu...@gmail.com> >>> wrote: >>> >>>> Hi Mich, >>>> >>>> Yes, that's the sequence of events. I think the big breakthrough is >>>> that (for now at least) Spark is throwing errors instead of the queries >>>> hanging. Which is a big step forward. I can at least troubleshoot issues if >>>> I know what they are. >>>> >>>> When I reflect on the issues I faced and the solutions, my issue may >>>> have been driver memory all along. I just couldn't determine that was the >>>> issue because I never saw any errors. In one case, converting a LEFT JOIN >>>> to an inner JOIN caused the query to run. In another case, replacing a text >>>> field with an int ID and JOINing on the ID column worked. Per your advice, >>>> changing file formats from ORC to Parquet solved one issue. These >>>> interventions could have changed the way Spark needed to broadcast data to >>>> execute the query, thereby reducing demand on the memory-constrained >>>> driver. >>>> >>>> Fingers crossed this is the solution. I will reply to this thread if >>>> the issue comes up again (hopefully it doesn't!). >>>> >>>> Thanks again, >>>> >>>> Patrick >>>> >>>> On Thu, Aug 17, 2023 at 1:54 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Patrik, >>>>> >>>>> glad that you have managed to sort this problem out. Hopefully it will >>>>> go away for good. >>>>> >>>>> Still we are in the dark about how this problem is going away and >>>>> coming back :( As I recall the chronology of events were as follows: >>>>> >>>>> >>>>> 1. The Issue with hanging Spark job reported >>>>> 2. concurrency on Hive metastore (single threaded Derby DB) was >>>>> identified as a possible cause >>>>> 3. You changed the underlying Hive table formats from ORC to >>>>> Parquet and somehow it worked >>>>> 4. The issue was reported again >>>>> 5. You upgraded the spark version from 3.4.0 to 3.4.1 (as a >>>>> possible underlying issue) and encountered driver memory limitation. >>>>> 6. you allocated more memory to the driver and it is running ok >>>>> for now, >>>>> 7. It appears that you are doing some join between a large dataset >>>>> and a smaller dataset. Spark decides to do broadcast join by taking the >>>>> smaller dataset, fit it into the driver memory and broadcasting it to >>>>> all >>>>> executors. That is where you had this issue with the memory limit on >>>>> the >>>>> driver. In the absence of Broadcast join, spark needs to perform a >>>>> shuffle >>>>> which is an expensive process. >>>>> 1. you can increase the broadcast join memory setting the conf. >>>>> parameter "spark.sql.autoBroadcastJoinThreshold" in bytes (check >>>>> the manual) >>>>> 2. You can also disable the broadcast join by setting >>>>> "spark.sql.autoBroadcastJoinThreshold", -1 to see what is happening. >>>>> >>>>> >>>>> So you still need to find a resolution to this issue. Maybe 3.4.1 has >>>>> managed to fix some underlying issues. >>>>> >>>>> HTH >>>>> >>>>> Mich Talebzadeh, >>>>> Solutions Architect/Engineering Lead >>>>> London >>>>> United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, 17 Aug 2023 at 17:17, Patrick Tucci <patrick.tu...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> I just wanted to follow up on this issue. This issue has continued >>>>>> since our last correspondence. Today I had a query hang and couldn't >>>>>> resolve the issue. I decided to upgrade my Spark install from 3.4.0 to >>>>>> 3.4.1. After doing so, instead of the query hanging, I got an error >>>>>> message >>>>>> that the driver didn't have enough memory to broadcast objects. After >>>>>> increasing the driver memory, the query runs without issue. >>>>>> >>>>>> I hope this can be helpful to someone else in the future. Thanks >>>>>> again for the support, >>>>>> >>>>>> Patrick >>>>>> >>>>>> On Sun, Aug 13, 2023 at 7:52 AM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> OK I use Hive 3.1.1 >>>>>>> >>>>>>> My suggestion is to put your hive issues to u...@hive.apache.org >>>>>>> and for JAVA version compatibility >>>>>>> >>>>>>> They will give you better info. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> Solutions Architect/Engineering Lead >>>>>>> London >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, 13 Aug 2023 at 11:48, Patrick Tucci <patrick.tu...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I attempted to install Hive yesterday. The experience was similar >>>>>>>> to other attempts at installing Hive: it took a few hours and at the >>>>>>>> end of >>>>>>>> the process, I didn't have a working setup. The latest stable release >>>>>>>> would >>>>>>>> not run. I never discovered the cause, but similar StackOverflow >>>>>>>> questions >>>>>>>> suggest it might be a Java incompatibility issue. Since I didn't want >>>>>>>> to >>>>>>>> downgrade or install an additional Java version, I attempted to use the >>>>>>>> latest alpha as well. This appears to have worked, although I couldn't >>>>>>>> figure out how to get it to use the metastore_db from Spark. >>>>>>>> >>>>>>>> After turning my attention back to Spark, I determined the issue. >>>>>>>> After much troubleshooting, I discovered that if I performed a COUNT(*) >>>>>>>> using the same JOINs, the problem query worked. I removed all the >>>>>>>> columns from the SELECT statement and added them one by one until I >>>>>>>> found >>>>>>>> the culprit. It's a text field on one of the tables. When the query >>>>>>>> SELECTs >>>>>>>> this column, or attempts to filter on it, the query hangs and never >>>>>>>> completes. If I remove all explicit references to this column, the >>>>>>>> query >>>>>>>> works fine. Since I need this column in the results, I went back to >>>>>>>> the ETL >>>>>>>> and extracted the values to a dimension table. I replaced the text >>>>>>>> column >>>>>>>> in the source table with an integer ID column and the query worked >>>>>>>> without >>>>>>>> issue. >>>>>>>> >>>>>>>> On the topic of Hive, does anyone have any detailed resources for >>>>>>>> how to set up Hive from scratch? Aside from the official site, since >>>>>>>> those >>>>>>>> instructions didn't work for me. I'm starting to feel uneasy about >>>>>>>> building >>>>>>>> my process around Spark. There really shouldn't be any instances where >>>>>>>> I >>>>>>>> ask Spark to run legal ANSI SQL code and it just does nothing. In the >>>>>>>> past >>>>>>>> 4 days I've run into 2 of these instances, and the solution was more >>>>>>>> voodoo >>>>>>>> and magic than examining errors/logs and fixing code. I feel that I >>>>>>>> should >>>>>>>> have a contingency plan in place for when I run into an issue with >>>>>>>> Spark >>>>>>>> that can't be resolved. >>>>>>>> >>>>>>>> Thanks everyone. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 12, 2023 at 2:18 PM Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> OK you would not have known unless you went through the process so >>>>>>>>> to speak. >>>>>>>>> >>>>>>>>> Let us do something revolutionary here 😁 >>>>>>>>> >>>>>>>>> Install hive and its metastore. You already have hadoop anyway >>>>>>>>> >>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/hive/adminmanual+installation >>>>>>>>> >>>>>>>>> hive metastore >>>>>>>>> >>>>>>>>> >>>>>>>>> https://data-flair.training/blogs/apache-hive-metastore/#:~:text=What%20is%20Hive%20Metastore%3F,by%20using%20metastore%20service%20API >>>>>>>>> . >>>>>>>>> >>>>>>>>> choose one of these >>>>>>>>> >>>>>>>>> derby hive mssql mysql oracle postgres >>>>>>>>> >>>>>>>>> Mine is an oracle. postgres is good as well. >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> >>>>>>>>> Mich Talebzadeh, >>>>>>>>> Solutions Architect/Engineering Lead >>>>>>>>> London >>>>>>>>> United Kingdom >>>>>>>>> >>>>>>>>> >>>>>>>>> view my Linkedin profile >>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>>>> for any loss, damage or destruction of data or any other property >>>>>>>>> which may >>>>>>>>> arise from relying on this email's technical content is explicitly >>>>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>>>> damages >>>>>>>>> arising from such loss, damage or destruction. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, 12 Aug 2023 at 18:31, Patrick Tucci < >>>>>>>>> patrick.tu...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Yes, on premise. >>>>>>>>>> >>>>>>>>>> Unfortunately after installing Delta Lake and re-writing all >>>>>>>>>> tables as Delta tables, the issue persists. >>>>>>>>>> >>>>>>>>>> On Sat, Aug 12, 2023 at 11:34 AM Mich Talebzadeh < >>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> ok sure. >>>>>>>>>>> >>>>>>>>>>> Is this Delta Lake going to be on-premise? >>>>>>>>>>> >>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>> Solutions Architect/Engineering Lead >>>>>>>>>>> London >>>>>>>>>>> United Kingdom >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> view my Linkedin profile >>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>>>>>> responsibility for any loss, damage or destruction of data or any >>>>>>>>>>> other >>>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>>> content is >>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any >>>>>>>>>>> monetary damages arising from such loss, damage or destruction. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, 12 Aug 2023 at 12:03, Patrick Tucci < >>>>>>>>>>> patrick.tu...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Mich, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the feedback. My original intention after reading >>>>>>>>>>>> your response was to stick to Hive for managing tables. >>>>>>>>>>>> Unfortunately, I'm >>>>>>>>>>>> running into another case of SQL scripts hanging. Since all tables >>>>>>>>>>>> are >>>>>>>>>>>> already Parquet, I'm out of troubleshooting options. I'm going to >>>>>>>>>>>> migrate >>>>>>>>>>>> to Delta Lake and see if that solves the issue. >>>>>>>>>>>> >>>>>>>>>>>> Thanks again for your feedback. >>>>>>>>>>>> >>>>>>>>>>>> Patrick >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 11, 2023 at 10:09 AM Mich Talebzadeh < >>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Patrick, >>>>>>>>>>>>> >>>>>>>>>>>>> There is not anything wrong with Hive On-premise it is the >>>>>>>>>>>>> best data warehouse there is >>>>>>>>>>>>> >>>>>>>>>>>>> Hive handles both ORC and Parquet formal well. They are both >>>>>>>>>>>>> columnar implementations of relational model. What you are seeing >>>>>>>>>>>>> is the >>>>>>>>>>>>> Spark API to Hive which prefers Parquet. I found out a few years >>>>>>>>>>>>> ago. >>>>>>>>>>>>> >>>>>>>>>>>>> From your point of view I suggest you stick to >>>>>>>>>>>>> parquet format with Hive specific to Spark. As far as I know you >>>>>>>>>>>>> don't have >>>>>>>>>>>>> a fully independent Hive DB as yet. >>>>>>>>>>>>> >>>>>>>>>>>>> Anyway stick to Hive for now as you never know what issues you >>>>>>>>>>>>> may be facing using moving to Delta Lake. >>>>>>>>>>>>> >>>>>>>>>>>>> You can also use compression >>>>>>>>>>>>> >>>>>>>>>>>>> STORED AS PARQUET >>>>>>>>>>>>> TBLPROPERTIES ("parquet.compression"="SNAPPY") >>>>>>>>>>>>> >>>>>>>>>>>>> ALSO >>>>>>>>>>>>> >>>>>>>>>>>>> ANALYZE TABLE <TABLE_NAME> COMPUTE STATISTICS FOR COLUMNS >>>>>>>>>>>>> >>>>>>>>>>>>> HTH >>>>>>>>>>>>> >>>>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>>>> Solutions Architect/Engineering Lead >>>>>>>>>>>>> London >>>>>>>>>>>>> United Kingdom >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> view my Linkedin profile >>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any >>>>>>>>>>>>> other >>>>>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>>>>> content is >>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for >>>>>>>>>>>>> any >>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, 11 Aug 2023 at 11:26, Patrick Tucci < >>>>>>>>>>>>> patrick.tu...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the reply Stephen and Mich. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stephen, you're right, it feels like Spark is waiting for >>>>>>>>>>>>>> something, but I'm not sure what. I'm the only user on the >>>>>>>>>>>>>> cluster and >>>>>>>>>>>>>> there are plenty of resources (+60 cores, +250GB RAM). I even >>>>>>>>>>>>>> tried >>>>>>>>>>>>>> restarting Hadoop, Spark and the host servers to make sure >>>>>>>>>>>>>> nothing was >>>>>>>>>>>>>> lingering in the background. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mich, thank you so much, your suggestion worked. Storing the >>>>>>>>>>>>>> tables as Parquet solves the issue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Interestingly, I found that only the MemberEnrollment table >>>>>>>>>>>>>> needs to be Parquet. The ID field in MemberEnrollment is an int >>>>>>>>>>>>>> calculated >>>>>>>>>>>>>> during load by a ROW_NUMBER() function. Further testing found >>>>>>>>>>>>>> that if I >>>>>>>>>>>>>> hard code a 0 as MemberEnrollment.ID instead of using the >>>>>>>>>>>>>> ROW_NUMBER() >>>>>>>>>>>>>> function, the query works without issue even if both tables are >>>>>>>>>>>>>> ORC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Should I infer from this issue that the Hive components >>>>>>>>>>>>>> prefer Parquet over ORC? Furthermore, should I consider using a >>>>>>>>>>>>>> different >>>>>>>>>>>>>> table storage framework, like Delta Lake, instead of the Hive >>>>>>>>>>>>>> components? >>>>>>>>>>>>>> Given this issue and other issues I've had with Hive, I'm >>>>>>>>>>>>>> starting to think >>>>>>>>>>>>>> a different solution might be more robust and stable. The main >>>>>>>>>>>>>> condition is >>>>>>>>>>>>>> that my application operates solely through Thrift server, so I >>>>>>>>>>>>>> need to be >>>>>>>>>>>>>> able to connect to Spark through Thrift server and have it write >>>>>>>>>>>>>> tables >>>>>>>>>>>>>> using Delta Lake instead of Hive. From this StackOverflow >>>>>>>>>>>>>> question, it >>>>>>>>>>>>>> looks like this is possible: >>>>>>>>>>>>>> https://stackoverflow.com/questions/69862388/how-to-run-spark-sql-thrift-server-in-local-mode-and-connect-to-delta-using-jdbc >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again to everyone who replied for their help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Patrick >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Aug 11, 2023 at 2:14 AM Mich Talebzadeh < >>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Steve may have a valid point. You raised an issue with >>>>>>>>>>>>>>> concurrent writes before, if I recall correctly. Since this >>>>>>>>>>>>>>> limitation may >>>>>>>>>>>>>>> be due to Hive metastore. By default Spark uses Apache Derby for >>>>>>>>>>>>>>> its database persistence. *However it is limited to only >>>>>>>>>>>>>>> one Spark session at any time for the purposes of metadata >>>>>>>>>>>>>>> storage.* >>>>>>>>>>>>>>> That may be the cause here as well. Does this happen if the >>>>>>>>>>>>>>> underlying >>>>>>>>>>>>>>> tables are created as PARQUET as opposed to ORC? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> HTH >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>>>>>> Solutions Architect/Engineering Lead >>>>>>>>>>>>>>> London >>>>>>>>>>>>>>> United Kingdom >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> view my Linkedin profile >>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or >>>>>>>>>>>>>>> any other >>>>>>>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>>>>>>> content is >>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for >>>>>>>>>>>>>>> any >>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 11 Aug 2023 at 01:33, Stephen Coy >>>>>>>>>>>>>>> <s...@infomedia.com.au.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Patrick, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When this has happened to me in the past (admittedly via >>>>>>>>>>>>>>>> spark-submit) it has been because another job was still >>>>>>>>>>>>>>>> running and had >>>>>>>>>>>>>>>> already claimed some of the resources (cores and memory). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think this can also happen if your configuration tries to >>>>>>>>>>>>>>>> claim resources that will never be available. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SteveC >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 11 Aug 2023, at 3:36 am, Patrick Tucci < >>>>>>>>>>>>>>>> patrick.tu...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm attempting to run a query on Spark 3.4.0 through the >>>>>>>>>>>>>>>> Spark ThriftServer. The cluster has 64 cores, 250GB RAM, and >>>>>>>>>>>>>>>> operates in >>>>>>>>>>>>>>>> standalone mode using HDFS for storage. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The query is as follows: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SELECT ME.*, MB.BenefitID >>>>>>>>>>>>>>>> FROM MemberEnrollment ME >>>>>>>>>>>>>>>> JOIN MemberBenefits MB >>>>>>>>>>>>>>>> ON ME.ID <http://me.id/> = MB.EnrollmentID >>>>>>>>>>>>>>>> WHERE MB.BenefitID = 5 >>>>>>>>>>>>>>>> LIMIT 10 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The tables are defined as follows: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- Contains about 3M rows >>>>>>>>>>>>>>>> CREATE TABLE MemberEnrollment >>>>>>>>>>>>>>>> ( >>>>>>>>>>>>>>>> ID INT >>>>>>>>>>>>>>>> , MemberID VARCHAR(50) >>>>>>>>>>>>>>>> , StartDate DATE >>>>>>>>>>>>>>>> , EndDate DATE >>>>>>>>>>>>>>>> -- Other columns, but these are the most important >>>>>>>>>>>>>>>> ) STORED AS ORC; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- Contains about 25m rows >>>>>>>>>>>>>>>> CREATE TABLE MemberBenefits >>>>>>>>>>>>>>>> ( >>>>>>>>>>>>>>>> EnrollmentID INT >>>>>>>>>>>>>>>> , BenefitID INT >>>>>>>>>>>>>>>> ) STORED AS ORC; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When I execute the query, it runs a single broadcast >>>>>>>>>>>>>>>> exchange stage, which completes after a few seconds. Then >>>>>>>>>>>>>>>> everything just >>>>>>>>>>>>>>>> hangs. The JDBC/ODBC tab in the UI shows the query state as >>>>>>>>>>>>>>>> COMPILED, but >>>>>>>>>>>>>>>> no stages or tasks are executing or pending: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <image.png> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've let the query run for as long as 30 minutes with no >>>>>>>>>>>>>>>> additional stages, progress, or errors. I'm not sure where to >>>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>> troubleshooting. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for your help, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Patrick >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This email contains confidential information of and is the >>>>>>>>>>>>>>>> copyright of Infomedia. It must not be forwarded, amended or >>>>>>>>>>>>>>>> disclosed >>>>>>>>>>>>>>>> without consent of the sender. If you received this message by >>>>>>>>>>>>>>>> mistake, >>>>>>>>>>>>>>>> please advise the sender and delete all copies. Security of >>>>>>>>>>>>>>>> transmission on >>>>>>>>>>>>>>>> the internet cannot be guaranteed, could be infected, >>>>>>>>>>>>>>>> intercepted, or >>>>>>>>>>>>>>>> corrupted and you should ensure you have suitable antivirus >>>>>>>>>>>>>>>> protection in >>>>>>>>>>>>>>>> place. By sending us your or any third party personal details, >>>>>>>>>>>>>>>> you consent >>>>>>>>>>>>>>>> to (or confirm you have obtained consent from such third >>>>>>>>>>>>>>>> parties) to >>>>>>>>>>>>>>>> Infomedia’s privacy policy. >>>>>>>>>>>>>>>> http://www.infomedia.com.au/privacy-policy/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>