Re: Validate spark sql

Bjørn Jørgensen Mon, 25 Dec 2023 04:44:37 -0800

Mailing lists

For broad, opinion based, ask for external resources, debug issues, bugs,
contributing to the project, and scenarios, it is recommended you use the
user@spark.apache.org mailing list.


   - user@spark.apache.org
   <https://lists.apache.org/list.html?user@spark.apache.org> is for usage
   questions, help, and announcements. (subscribe)
   
<user-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)>
    (unsubscribe)
   
<user-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)>
    (archives) <https://lists.apache.org/list.html?user@spark.apache.org>
   - d...@spark.apache.org
   <https://lists.apache.org/list.html?d...@spark.apache.org> is for people
   who want to contribute code to Spark. (subscribe)
   
<dev-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)>
    (unsubscribe)
   
<dev-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)>
    (archives) <https://lists.apache.org/list.html?d...@spark.apache.org>



man. 25. des. 2023 kl. 04:58 skrev Mich Talebzadeh <
mich.talebza...@gmail.com>:

> Well not to put too finer point on it, in a public forum, one ought to
> respect the importance of open communication. Everyone has the right to ask
> questions, seek information, and engage in discussions without facing
> unnecessary patronization.
>
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>> This is a user-list question, not a dev-list question. Moving this
>> conversation to the user list and BCC-ing the dev list.
>>
>> Also, this statement
>>
>> > We are not validating against table or column existence.
>>
>> is not correct. When you call spark.sql(…), Spark will lookup the table
>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>
>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>> validation-only operation.
>>
>> This question of validating SQL is already discussed on Stack Overflow
>> <https://stackoverflow.com/q/46973729/877069>. You may find some useful
>> tips there.
>>
>> Nick
>>
>>
>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>
>> Yes, you can validate the syntax of your PySpark SQL queries without
>> connecting to an actual dataset or running the queries on a cluster.
>> PySpark provides a method for syntax validation without executing the
>> query. Something like below
>> ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>       /_/
>>
>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>> Spark context Web UI available at http://rhes75:4040
>> Spark context available as 'sc' (master = local[*], app id =
>> local-1703410019374).
>> SparkSession available as 'spark'.
>> >>> from pyspark.sql import SparkSession
>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>> only runtime SQL configurations will take effect.
>> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value"
>> >>> try:
>> ...   spark.sql(sql)
>> ...   print("is working")
>> ... except Exception as e:
>> ...   print(f"Syntax error: {e}")
>> ...
>> Syntax error:
>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>
>> == SQL ==
>> SELECT * FROM <TABLE> WHERE <COLUMN> = some value
>> --------------^^^
>>
>> Here we only check for syntax errors and not the actual existence of
>> query semantics. We are not validating against table or column existence.
>>
>> This method is useful when you want to catch obvious syntax errors before
>> submitting your PySpark job to a cluster, especially when you don't have
>> access to the actual data.
>>
>> In summary
>>
>>    - Theis method validates syntax but will not catch semantic errors
>>    - If you need more comprehensive validation, consider using a testing
>>    framework and a small dataset.
>>    - For complex queries, using a linter or code analysis tool can help
>>    identify potential issues.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com> wrote:
>>
>>> Hello,
>>> Is there a way to validate pyspark sql to validate only syntax errors?.
>>> I cannot connect do actual data set to perform this validation.  Any
>>> help would be appreciated.
>>>
>>>
>>> Thanks
>>> Ram
>>>
>>
>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Validate spark sql

Reply via email to