Re: Spark-SQL - Query Hanging, How To Troubleshoot

Patrick Tucci Thu, 17 Aug 2023 15:33:59 -0700

No, the driver memory was not set explicitly. So it was likely the default
value, which appears to be 1GB.


On Thu, Aug 17, 2023, 16:49 Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> One question, what was the driver memory before setting it to 4G? Did you
> have it set at all before?
>
> HTH
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 17 Aug 2023 at 21:01, Patrick Tucci <patrick.tu...@gmail.com>
> wrote:
>
>> Hi Mich,
>>
>> Here are my config values from spark-defaults.conf:
>>
>> spark.eventLog.enabled true
>> spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs
>> spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
>> spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs
>> spark.history.fs.update.interval 10s
>> spark.history.ui.port 18080
>> spark.sql.warehouse.dir hdfs://10.0.50.1:8020/user/spark/warehouse
>> spark.executor.cores 4
>> spark.executor.memory 16000M
>> spark.sql.legacy.createHiveTableByDefault false
>> spark.driver.host 10.0.50.1
>> spark.scheduler.mode FAIR
>> spark.driver.memory 4g #added 2023-08-17
>>
>> The only application that runs on the cluster is the Spark Thrift server,
>> which I launch like so:
>>
>> ~/spark/sbin/start-thriftserver.sh --master spark://10.0.50.1:7077
>>
>> The cluster runs in standalone mode and does not use Yarn for resource
>> management. As a result, the Spark Thrift server acquires all available
>> cluster resources when it starts. This is okay; as of right now, I am the
>> only user of the cluster. If I add more users, they will also be SQL users,
>> submitting queries through the Thrift server.
>>
>> Let me know if you have any other questions or thoughts.
>>
>> Thanks,
>>
>> Patrick
>>
>> On Thu, Aug 17, 2023 at 3:09 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hello Paatrick,
>>>
>>> As a matter of interest what parameters and their respective values do
>>> you use in spark-submit. I assume it is running in YARN mode.
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Solutions Architect/Engineering Lead
>>> London
>>> United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 17 Aug 2023 at 19:36, Patrick Tucci <patrick.tu...@gmail.com>
>>> wrote:
>>>
>>>> Hi Mich,
>>>>
>>>> Yes, that's the sequence of events. I think the big breakthrough is
>>>> that (for now at least) Spark is throwing errors instead of the queries
>>>> hanging. Which is a big step forward. I can at least troubleshoot issues if
>>>> I know what they are.
>>>>
>>>> When I reflect on the issues I faced and the solutions, my issue may
>>>> have been driver memory all along. I just couldn't determine that was the
>>>> issue because I never saw any errors. In one case, converting a LEFT JOIN
>>>> to an inner JOIN caused the query to run. In another case, replacing a text
>>>> field with an int ID and JOINing on the ID column worked. Per your advice,
>>>> changing file formats from ORC to Parquet solved one issue. These
>>>> interventions could have changed the way Spark needed to broadcast data to
>>>> execute the query, thereby reducing demand on the memory-constrained 
>>>> driver.
>>>>
>>>> Fingers crossed this is the solution. I will reply to this thread if
>>>> the issue comes up again (hopefully it doesn't!).
>>>>
>>>> Thanks again,
>>>>
>>>> Patrick
>>>>
>>>> On Thu, Aug 17, 2023 at 1:54 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi Patrik,
>>>>>
>>>>> glad that you have managed to sort this problem out. Hopefully it will
>>>>> go away for good.
>>>>>
>>>>> Still we are in the dark about how this problem is going away and
>>>>> coming back :( As I recall the chronology of events were as follows:
>>>>>
>>>>>
>>>>>    1. The Issue with hanging Spark job reported
>>>>>    2. concurrency on Hive metastore (single threaded Derby DB) was
>>>>>    identified as a possible cause
>>>>>    3. You changed the underlying Hive table formats from ORC to
>>>>>    Parquet and somehow it worked
>>>>>    4. The issue was reported again
>>>>>    5. You upgraded the spark version from 3.4.0 to 3.4.1 (as a
>>>>>    possible underlying issue) and encountered driver memory limitation.
>>>>>    6. you allocated more memory to the driver and it is running ok
>>>>>    for now,
>>>>>    7. It appears that you are doing some join between a large dataset
>>>>>    and a smaller dataset. Spark decides to do broadcast join by taking the
>>>>>    smaller dataset, fit it into the driver memory and broadcasting it to 
>>>>> all
>>>>>    executors.  That is where you had this issue with the memory limit on 
>>>>> the
>>>>>    driver. In the absence of Broadcast join, spark needs to perform a 
>>>>> shuffle
>>>>>    which is an expensive process.
>>>>>       1. you can increase the broadcast join memory setting the conf.
>>>>>       parameter "spark.sql.autoBroadcastJoinThreshold" in bytes (check 
>>>>> the manual)
>>>>>       2. You can also disable the broadcast join by setting
>>>>>       "spark.sql.autoBroadcastJoinThreshold", -1 to see what is happening.
>>>>>
>>>>>
>>>>> So you still need to find a resolution to this issue. Maybe 3.4.1 has
>>>>> managed to fix some underlying issues.
>>>>>
>>>>> HTH
>>>>>
>>>>> Mich Talebzadeh,
>>>>> Solutions Architect/Engineering Lead
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 17 Aug 2023 at 17:17, Patrick Tucci <patrick.tu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I just wanted to follow up on this issue. This issue has continued
>>>>>> since our last correspondence. Today I had a query hang and couldn't
>>>>>> resolve the issue. I decided to upgrade my Spark install from 3.4.0 to
>>>>>> 3.4.1. After doing so, instead of the query hanging, I got an error 
>>>>>> message
>>>>>> that the driver didn't have enough memory to broadcast objects. After
>>>>>> increasing the driver memory, the query runs without issue.
>>>>>>
>>>>>> I hope this can be helpful to someone else in the future. Thanks
>>>>>> again for the support,
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> On Sun, Aug 13, 2023 at 7:52 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> OK I use Hive 3.1.1
>>>>>>>
>>>>>>> My suggestion is to put your hive issues to u...@hive.apache.org
>>>>>>> and for JAVA version compatibility
>>>>>>>
>>>>>>> They will give you better info.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, 13 Aug 2023 at 11:48, Patrick Tucci <patrick.tu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I attempted to install Hive yesterday. The experience was similar
>>>>>>>> to other attempts at installing Hive: it took a few hours and at the 
>>>>>>>> end of
>>>>>>>> the process, I didn't have a working setup. The latest stable release 
>>>>>>>> would
>>>>>>>> not run. I never discovered the cause, but similar StackOverflow 
>>>>>>>> questions
>>>>>>>> suggest it might be a Java incompatibility issue. Since I didn't want 
>>>>>>>> to
>>>>>>>> downgrade or install an additional Java version, I attempted to use the
>>>>>>>> latest alpha as well. This appears to have worked, although I couldn't
>>>>>>>> figure out how to get it to use the metastore_db from Spark.
>>>>>>>>
>>>>>>>> After turning my attention back to Spark, I determined the issue.
>>>>>>>> After much troubleshooting, I discovered that if I performed a COUNT(*)
>>>>>>>> using the same JOINs, the problem query worked. I removed all the
>>>>>>>> columns from the SELECT statement and added them one by one until I 
>>>>>>>> found
>>>>>>>> the culprit. It's a text field on one of the tables. When the query 
>>>>>>>> SELECTs
>>>>>>>> this column, or attempts to filter on it, the query hangs and never
>>>>>>>> completes. If I remove all explicit references to this column, the 
>>>>>>>> query
>>>>>>>> works fine. Since I need this column in the results, I went back to 
>>>>>>>> the ETL
>>>>>>>> and extracted the values to a dimension table. I replaced the text 
>>>>>>>> column
>>>>>>>> in the source table with an integer ID column and the query worked 
>>>>>>>> without
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> On the topic of Hive, does anyone have any detailed resources for
>>>>>>>> how to set up Hive from scratch? Aside from the official site, since 
>>>>>>>> those
>>>>>>>> instructions didn't work for me. I'm starting to feel uneasy about 
>>>>>>>> building
>>>>>>>> my process around Spark. There really shouldn't be any instances where 
>>>>>>>> I
>>>>>>>> ask Spark to run legal ANSI SQL code and it just does nothing. In the 
>>>>>>>> past
>>>>>>>> 4 days I've run into 2 of these instances, and the solution was more 
>>>>>>>> voodoo
>>>>>>>> and magic than examining errors/logs and fixing code. I feel that I 
>>>>>>>> should
>>>>>>>> have a contingency plan in place for when I run into an issue with 
>>>>>>>> Spark
>>>>>>>> that can't be resolved.
>>>>>>>>
>>>>>>>> Thanks everyone.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Aug 12, 2023 at 2:18 PM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> OK you would not have known unless you went through the process so
>>>>>>>>> to speak.
>>>>>>>>>
>>>>>>>>> Let us do something revolutionary here 😁
>>>>>>>>>
>>>>>>>>> Install hive and its metastore. You already have hadoop anyway
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/hive/adminmanual+installation
>>>>>>>>>
>>>>>>>>> hive metastore
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://data-flair.training/blogs/apache-hive-metastore/#:~:text=What%20is%20Hive%20Metastore%3F,by%20using%20metastore%20service%20API
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> choose one of these
>>>>>>>>>
>>>>>>>>> derby  hive  mssql  mysql  oracle  postgres
>>>>>>>>>
>>>>>>>>> Mine is an oracle. postgres is good as well.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Mich Talebzadeh,
>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>> London
>>>>>>>>> United Kingdom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>>> which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>>> damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, 12 Aug 2023 at 18:31, Patrick Tucci <
>>>>>>>>> patrick.tu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, on premise.
>>>>>>>>>>
>>>>>>>>>> Unfortunately after installing Delta Lake and re-writing all
>>>>>>>>>> tables as Delta tables, the issue persists.
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 12, 2023 at 11:34 AM Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> ok sure.
>>>>>>>>>>>
>>>>>>>>>>> Is this Delta Lake going to be on-premise?
>>>>>>>>>>>
>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>>> London
>>>>>>>>>>> United Kingdom
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>> other
>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>> content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, 12 Aug 2023 at 12:03, Patrick Tucci <
>>>>>>>>>>> patrick.tu...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Mich,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the feedback. My original intention after reading
>>>>>>>>>>>> your response was to stick to Hive for managing tables. 
>>>>>>>>>>>> Unfortunately, I'm
>>>>>>>>>>>> running into another case of SQL scripts hanging. Since all tables 
>>>>>>>>>>>> are
>>>>>>>>>>>> already Parquet, I'm out of troubleshooting options. I'm going to 
>>>>>>>>>>>> migrate
>>>>>>>>>>>> to Delta Lake and see if that solves the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again for your feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> Patrick
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 11, 2023 at 10:09 AM Mich Talebzadeh <
>>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Patrick,
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is not anything wrong with Hive On-premise it is the
>>>>>>>>>>>>> best data warehouse there is
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hive handles both ORC and Parquet formal well. They are both
>>>>>>>>>>>>> columnar implementations of relational model. What you are seeing 
>>>>>>>>>>>>> is the
>>>>>>>>>>>>> Spark API to Hive which prefers Parquet. I found out a few years 
>>>>>>>>>>>>> ago.
>>>>>>>>>>>>>
>>>>>>>>>>>>> From your point of view I suggest you stick to
>>>>>>>>>>>>> parquet format with Hive specific to Spark. As far as I know you 
>>>>>>>>>>>>> don't have
>>>>>>>>>>>>> a fully independent Hive DB as yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway stick to Hive for now as you never know what issues you
>>>>>>>>>>>>> may be facing using moving to Delta Lake.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also use compression
>>>>>>>>>>>>>
>>>>>>>>>>>>> STORED AS PARQUET
>>>>>>>>>>>>> TBLPROPERTIES ("parquet.compression"="SNAPPY")
>>>>>>>>>>>>>
>>>>>>>>>>>>> ALSO
>>>>>>>>>>>>>
>>>>>>>>>>>>> ANALYZE TABLE <TABLE_NAME> COMPUTE STATISTICS FOR COLUMNS
>>>>>>>>>>>>>
>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>>>>> London
>>>>>>>>>>>>> United Kingdom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>>>> other
>>>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>>>> content is
>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for 
>>>>>>>>>>>>> any
>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, 11 Aug 2023 at 11:26, Patrick Tucci <
>>>>>>>>>>>>> patrick.tu...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the reply Stephen and Mich.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephen, you're right, it feels like Spark is waiting for
>>>>>>>>>>>>>> something, but I'm not sure what. I'm the only user on the 
>>>>>>>>>>>>>> cluster and
>>>>>>>>>>>>>> there are plenty of resources (+60 cores, +250GB RAM). I even 
>>>>>>>>>>>>>> tried
>>>>>>>>>>>>>> restarting Hadoop, Spark and the host servers to make sure 
>>>>>>>>>>>>>> nothing was
>>>>>>>>>>>>>> lingering in the background.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mich, thank you so much, your suggestion worked. Storing the
>>>>>>>>>>>>>> tables as Parquet solves the issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Interestingly, I found that only the MemberEnrollment table
>>>>>>>>>>>>>> needs to be Parquet. The ID field in MemberEnrollment is an int 
>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>> during load by a ROW_NUMBER() function. Further testing found 
>>>>>>>>>>>>>> that if I
>>>>>>>>>>>>>> hard code a 0 as MemberEnrollment.ID instead of using the 
>>>>>>>>>>>>>> ROW_NUMBER()
>>>>>>>>>>>>>> function, the query works without issue even if both tables are 
>>>>>>>>>>>>>> ORC.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should I infer from this issue that the Hive components
>>>>>>>>>>>>>> prefer Parquet over ORC? Furthermore, should I consider using a 
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> table storage framework, like Delta Lake, instead of the Hive 
>>>>>>>>>>>>>> components?
>>>>>>>>>>>>>> Given this issue and other issues I've had with Hive, I'm 
>>>>>>>>>>>>>> starting to think
>>>>>>>>>>>>>> a different solution might be more robust and stable. The main 
>>>>>>>>>>>>>> condition is
>>>>>>>>>>>>>> that my application operates solely through Thrift server, so I 
>>>>>>>>>>>>>> need to be
>>>>>>>>>>>>>> able to connect to Spark through Thrift server and have it write 
>>>>>>>>>>>>>> tables
>>>>>>>>>>>>>> using Delta Lake instead of Hive. From this StackOverflow 
>>>>>>>>>>>>>> question, it
>>>>>>>>>>>>>> looks like this is possible:
>>>>>>>>>>>>>> https://stackoverflow.com/questions/69862388/how-to-run-spark-sql-thrift-server-in-local-mode-and-connect-to-delta-using-jdbc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks again to everyone who replied for their help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Patrick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Aug 11, 2023 at 2:14 AM Mich Talebzadeh <
>>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steve may have a valid point. You raised an issue with
>>>>>>>>>>>>>>> concurrent writes before, if I recall correctly. Since this 
>>>>>>>>>>>>>>> limitation may
>>>>>>>>>>>>>>> be due to Hive metastore. By default Spark uses Apache Derby for
>>>>>>>>>>>>>>> its database persistence. *However it is limited to only
>>>>>>>>>>>>>>> one Spark session at any time for the purposes of metadata 
>>>>>>>>>>>>>>> storage.*
>>>>>>>>>>>>>>> That may be the cause here as well. Does this happen if the 
>>>>>>>>>>>>>>> underlying
>>>>>>>>>>>>>>> tables are created as PARQUET as opposed to ORC?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>>>>>>> London
>>>>>>>>>>>>>>> United Kingdom
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or 
>>>>>>>>>>>>>>> any other
>>>>>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>>>>>> content is
>>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for 
>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 11 Aug 2023 at 01:33, Stephen Coy
>>>>>>>>>>>>>>> <s...@infomedia.com.au.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Patrick,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When this has happened to me in the past (admittedly via
>>>>>>>>>>>>>>>> spark-submit) it has been because another job was still 
>>>>>>>>>>>>>>>> running and had
>>>>>>>>>>>>>>>> already claimed some of the resources (cores and memory).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this can also happen if your configuration tries to
>>>>>>>>>>>>>>>> claim resources that will never be available.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> SteveC
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 11 Aug 2023, at 3:36 am, Patrick Tucci <
>>>>>>>>>>>>>>>> patrick.tu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm attempting to run a query on Spark 3.4.0 through the
>>>>>>>>>>>>>>>> Spark ThriftServer. The cluster has 64 cores, 250GB RAM, and 
>>>>>>>>>>>>>>>> operates in
>>>>>>>>>>>>>>>> standalone mode using HDFS for storage.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The query is as follows:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> SELECT ME.*, MB.BenefitID
>>>>>>>>>>>>>>>> FROM MemberEnrollment ME
>>>>>>>>>>>>>>>> JOIN MemberBenefits MB
>>>>>>>>>>>>>>>> ON ME.ID <http://me.id/> = MB.EnrollmentID
>>>>>>>>>>>>>>>> WHERE MB.BenefitID = 5
>>>>>>>>>>>>>>>> LIMIT 10
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The tables are defined as follows:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Contains about 3M rows
>>>>>>>>>>>>>>>> CREATE TABLE MemberEnrollment
>>>>>>>>>>>>>>>> (
>>>>>>>>>>>>>>>>     ID INT
>>>>>>>>>>>>>>>>     , MemberID VARCHAR(50)
>>>>>>>>>>>>>>>>     , StartDate DATE
>>>>>>>>>>>>>>>>     , EndDate DATE
>>>>>>>>>>>>>>>>     -- Other columns, but these are the most important
>>>>>>>>>>>>>>>> ) STORED AS ORC;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Contains about 25m rows
>>>>>>>>>>>>>>>> CREATE TABLE MemberBenefits
>>>>>>>>>>>>>>>> (
>>>>>>>>>>>>>>>>     EnrollmentID INT
>>>>>>>>>>>>>>>>     , BenefitID INT
>>>>>>>>>>>>>>>> ) STORED AS ORC;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When I execute the query, it runs a single broadcast
>>>>>>>>>>>>>>>> exchange stage, which completes after a few seconds. Then 
>>>>>>>>>>>>>>>> everything just
>>>>>>>>>>>>>>>> hangs. The JDBC/ODBC tab in the UI shows the query state as 
>>>>>>>>>>>>>>>> COMPILED, but
>>>>>>>>>>>>>>>> no stages or tasks are executing or pending:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <image.png>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've let the query run for as long as 30 minutes with no
>>>>>>>>>>>>>>>> additional stages, progress, or errors. I'm not sure where to 
>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>> troubleshooting.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Patrick
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This email contains confidential information of and is the
>>>>>>>>>>>>>>>> copyright of Infomedia. It must not be forwarded, amended or 
>>>>>>>>>>>>>>>> disclosed
>>>>>>>>>>>>>>>> without consent of the sender. If you received this message by 
>>>>>>>>>>>>>>>> mistake,
>>>>>>>>>>>>>>>> please advise the sender and delete all copies. Security of 
>>>>>>>>>>>>>>>> transmission on
>>>>>>>>>>>>>>>> the internet cannot be guaranteed, could be infected, 
>>>>>>>>>>>>>>>> intercepted, or
>>>>>>>>>>>>>>>> corrupted and you should ensure you have suitable antivirus 
>>>>>>>>>>>>>>>> protection in
>>>>>>>>>>>>>>>> place. By sending us your or any third party personal details, 
>>>>>>>>>>>>>>>> you consent
>>>>>>>>>>>>>>>> to (or confirm you have obtained consent from such third 
>>>>>>>>>>>>>>>> parties) to
>>>>>>>>>>>>>>>> Infomedia’s privacy policy.
>>>>>>>>>>>>>>>> http://www.infomedia.com.au/privacy-policy/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: Spark-SQL - Query Hanging, How To Troubleshoot

Reply via email to