Mailing lists For broad, opinion based, ask for external resources, debug issues, bugs, contributing to the project, and scenarios, it is recommended you use the user@spark.apache.org mailing list.
- user@spark.apache.org <https://lists.apache.org/list.html?user@spark.apache.org> is for usage questions, help, and announcements. (subscribe) <user-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)> (unsubscribe) <user-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)> (archives) <https://lists.apache.org/list.html?user@spark.apache.org> - d...@spark.apache.org <https://lists.apache.org/list.html?d...@spark.apache.org> is for people who want to contribute code to Spark. (subscribe) <dev-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)> (unsubscribe) <dev-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)> (archives) <https://lists.apache.org/list.html?d...@spark.apache.org> man. 25. des. 2023 kl. 04:58 skrev Mich Talebzadeh < mich.talebza...@gmail.com>: > Well not to put too finer point on it, in a public forum, one ought to > respect the importance of open communication. Everyone has the right to ask > questions, seek information, and engage in discussions without facing > unnecessary patronization. > > > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > >> This is a user-list question, not a dev-list question. Moving this >> conversation to the user list and BCC-ing the dev list. >> >> Also, this statement >> >> > We are not validating against table or column existence. >> >> is not correct. When you call spark.sql(…), Spark will lookup the table >> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them. >> >> Also, when you run DDL via spark.sql(…), Spark will actually run it. So >> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a >> validation-only operation. >> >> This question of validating SQL is already discussed on Stack Overflow >> <https://stackoverflow.com/q/46973729/877069>. You may find some useful >> tips there. >> >> Nick >> >> >> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> >> Yes, you can validate the syntax of your PySpark SQL queries without >> connecting to an actual dataset or running the queries on a cluster. >> PySpark provides a method for syntax validation without executing the >> query. Something like below >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /__ / .__/\_,_/_/ /_/\_\ version 3.4.0 >> /_/ >> >> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11) >> Spark context Web UI available at http://rhes75:4040 >> Spark context available as 'sc' (master = local[*], app id = >> local-1703410019374). >> SparkSession available as 'spark'. >> >>> from pyspark.sql import SparkSession >> >>> spark = SparkSession.builder.appName("validate").getOrCreate() >> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; >> only runtime SQL configurations will take effect. >> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value" >> >>> try: >> ... spark.sql(sql) >> ... print("is working") >> ... except Exception as e: >> ... print(f"Syntax error: {e}") >> ... >> Syntax error: >> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14) >> >> == SQL == >> SELECT * FROM <TABLE> WHERE <COLUMN> = some value >> --------------^^^ >> >> Here we only check for syntax errors and not the actual existence of >> query semantics. We are not validating against table or column existence. >> >> This method is useful when you want to catch obvious syntax errors before >> submitting your PySpark job to a cluster, especially when you don't have >> access to the actual data. >> >> In summary >> >> - Theis method validates syntax but will not catch semantic errors >> - If you need more comprehensive validation, consider using a testing >> framework and a small dataset. >> - For complex queries, using a linter or code analysis tool can help >> identify potential issues. >> >> HTH >> >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >> London >> United Kingdom >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com> wrote: >> >>> Hello, >>> Is there a way to validate pyspark sql to validate only syntax errors?. >>> I cannot connect do actual data set to perform this validation. Any >>> help would be appreciated. >>> >>> >>> Thanks >>> Ram >>> >> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297