oops but so spark does not support parquet V2 atm ?, as We have a use case where we need parquet V2 as one of our components uses Parquet V2 .
On Mon, Apr 15, 2024 at 7:09 PM Ryan Blue <b...@tabular.io> wrote: > Hi Prem, > > Parquet v1 is the default because v2 has not been finalized and adopted by > the community. I highly recommend not using v2 encodings at this time. > > Ryan > > On Mon, Apr 15, 2024 at 3:05 PM Prem Sahoo <prem.re...@gmail.com> wrote: > >> I am using spark 3.2.0 . but my spark package comes with parquet-mr 1.2.1 >> which writes in parquet version 1 not version version 2:(. so I was looking >> how to write in Parquet version2 ? >> >> On Mon, Apr 15, 2024 at 5:05 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Sorry you have a point there. It was released in version 3.00. What >>> version of spark are you using? >>> >>> Technologist | Solutions Architect | Data Engineer | Generative AI >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be guaranteed . It is essential to note >>> that, as with any advice, quote "one test result is worth one-thousand >>> expert opinions (Werner >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> >>> >>> On Mon, 15 Apr 2024 at 21:33, Prem Sahoo <prem.re...@gmail.com> wrote: >>> >>>> Thank you so much for the info! But do we have any release notes where >>>> it says spark2.4.0 onwards supports parquet version 2. I was under the >>>> impression Spark3.0 onwards it started supporting . >>>> >>>> >>>> >>>> >>>> On Mon, Apr 15, 2024 at 4:28 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Well if I am correct, Parquet version 2 support was introduced in >>>>> Spark version 2.4.0. Therefore, any version of Spark starting from 2.4.0 >>>>> supports Parquet version 2. Assuming that you are using Spark version >>>>> 2.4.0 or later, you should be able to take advantage of Parquet version 2 >>>>> features. >>>>> >>>>> HTH >>>>> >>>>> Mich Talebzadeh, >>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>> London >>>>> United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* The information provided is correct to the best of my >>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>> expert opinions (Werner >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>> >>>>> >>>>> On Mon, 15 Apr 2024 at 20:53, Prem Sahoo <prem.re...@gmail.com> wrote: >>>>> >>>>>> Thank you for the information! >>>>>> I can use any version of parquet-mr to produce parquet file. >>>>>> >>>>>> regarding 2nd question . >>>>>> Which version of spark is supporting parquet version 2? >>>>>> May I get the release notes where parquet versions are mentioned ? >>>>>> >>>>>> >>>>>> On Mon, Apr 15, 2024 at 2:34 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Parquet-mr is a Java library that provides functionality for >>>>>>> working with Parquet files with hadoop. It is therefore more geared >>>>>>> towards working with Parquet files within the Hadoop ecosystem, >>>>>>> particularly using MapReduce jobs. There is no definitive way to check >>>>>>> exact compatible versions within the library itself. However, you can >>>>>>> have >>>>>>> a look at this >>>>>>> >>>>>>> https://github.com/apache/parquet-mr/blob/master/CHANGES.md >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>>> London >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>> expert opinions (Werner >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>> >>>>>>> >>>>>>> On Mon, 15 Apr 2024 at 18:59, Prem Sahoo <prem.re...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Team, >>>>>>>> May I know how to check which version of parquet is supported by >>>>>>>> parquet-mr 1.2.1 ? >>>>>>>> >>>>>>>> Which version of parquet-mr is supporting parquet version 2 (V2) ? >>>>>>>> >>>>>>>> Which version of spark is supporting parquet version 2? >>>>>>>> May I get the release notes where parquet versions are mentioned ? >>>>>>>> >>>>>>> > > -- > Ryan Blue > Tabular >