Hi Mich, Thanks for the suggestions. I checked the documentation regarding the issue in data types and found that the different timezone settings being used in spark & snowflake was the issue. Specifying the timezone in spark options while writing the data to snowflake worked 😁
Documentation link : https://docs.snowflake.com/en/user-guide/spark-connector-use#working-with-timestamps-and-time-zones Thank you once again for your help. Regards, Varun Shah On Sat, Feb 10, 2024, 04:01 Mich Talebzadeh <[email protected]> wrote: > Hi Varun, > > I am no expert on Snowflake, however, the issue you are facing, > particularly if it involves data trimming in a COPY statement and potential > data mismatch, is likely related to how Snowflake handles data ingestion > rather than being directly tied to PySpark. The COPY command in Snowflake > is used to load data from external files (like those in s3) into Snowflake > tables. Possible causes for data truncation or mismatch could include > differences in data types, column lengths, or encoding between your source > data and the Snowflake table schema. It could also be related to the way > your PySpark application is formatting or providing data to Snowflake. > > Check these > > - Schema Matching: Ensure that the data types, lengths, and encoding > of the columns in your Snowflake table match the corresponding columns in > your PySpark DataFrame. > - Column Mapping: Explicitly map the columns in your PySpark DataFrame > to the corresponding columns in the Snowflake table during the write > operation. This can help avoid any implicit mappings that might be causing > issues. > > > 1. > > HTH > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 9 Feb 2024 at 13:06, Varun Shah <[email protected]> wrote: > >> Hi Team, >> >> We currently have implemented pyspark spark-streaming application on >> databricks, where we read data from s3 and write to the snowflake table >> using snowflake connector jars (net.snowflake:snowflake-jdbc v3.14.5 and >> net.snowflake:spark-snowflake v2.12:2.14.0-spark_3.3) . >> >> Currently facing an issue where if we give a large number of columns, it >> trims the data in a copy statement, thereby unable to write to the >> snowflake as the data mismatch happens. >> >> Using databricks 11.3 LTS with Spark 3.3.0 and Scala 2.12 version. >> >> Can you please help on how I can resolve this issue ? I tried searching >> online, but did not get any such articles. >> >> Looking forward to hearing from you. >> >> Regards, >> Varun Shah >> >> >>
