Re: Pyspark Write Batch Streaming Data to Snowflake Fails with more columns

Varun Shah Sat, 10 Feb 2024 10:01:55 -0800

Hi Mich,

Thanks for the suggestions. I checked the documentation regarding the issue
in data types and found that the different timezone settings being used in
spark & snowflake was the issue. Specifying the timezone in spark options
while writing the data to snowflake worked 😁


Documentation link :
https://docs.snowflake.com/en/user-guide/spark-connector-use#working-with-timestamps-and-time-zones

Thank you once again for your help.

Regards,
Varun Shah



On Sat, Feb 10, 2024, 04:01 Mich Talebzadeh <[email protected]>
wrote:

> Hi Varun,
>
> I am no expert on Snowflake, however, the issue you are facing,
> particularly if it involves data trimming in a COPY statement and potential
> data mismatch, is likely related to how Snowflake handles data ingestion
> rather than being directly tied to PySpark. The COPY command in Snowflake
> is used to load data from external files (like those in s3) into Snowflake
> tables. Possible causes for data truncation or mismatch could include
> differences in data types, column lengths, or encoding between your source
> data and the Snowflake table schema. It could also be related to the way
> your PySpark application is formatting or providing data to Snowflake.
>
> Check these
>
>    - Schema Matching: Ensure that the data types, lengths, and encoding
>    of the columns in your Snowflake table match the corresponding columns in
>    your PySpark DataFrame.
>    - Column Mapping: Explicitly map the columns in your PySpark DataFrame
>    to the corresponding columns in the Snowflake table during the write
>    operation. This can help avoid any implicit mappings that might be causing
>    issues.
>
>
>    1.
>
>    HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 9 Feb 2024 at 13:06, Varun Shah <[email protected]> wrote:
>
>> Hi Team,
>>
>> We currently have implemented pyspark spark-streaming application on
>> databricks, where we read data from s3 and write to the snowflake table
>> using snowflake connector jars (net.snowflake:snowflake-jdbc v3.14.5 and
>> net.snowflake:spark-snowflake v2.12:2.14.0-spark_3.3) .
>>
>> Currently facing an issue where if we give a large number of columns, it
>> trims the data in a copy statement, thereby unable to write to the
>> snowflake as the data mismatch happens.
>>
>> Using databricks 11.3 LTS with Spark 3.3.0 and Scala 2.12 version.
>>
>> Can you please help on how I can resolve this issue ? I tried searching
>> online, but did not get any such articles.
>>
>> Looking forward to hearing from you.
>>
>> Regards,
>> Varun Shah
>>
>>
>>

Re: Pyspark Write Batch Streaming Data to Snowflake Fails with more columns

Reply via email to