Hello Team, Do you have any clue in which version of parquet-mr jar Parquet V2 encoding code is available ?
On Sun, Apr 21, 2024 at 6:21 PM Prem Sahoo <prem.re...@gmail.com> wrote: > Thanks Vinoo for the valuable information . > > On Sat, Apr 20, 2024 at 5:07 PM Vinoo Ganesh <vinoo.gan...@gmail.com> > wrote: > >> Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet V2 >> as a standard isn't finalized just yet. Meaning there is no formal, >> *finalized* "contract" that specifies what it means to write data in the >> V2 >> version. The discussions/conversations about what the final V2 standard >> may >> be are still in progress and are evolving. >> >> That being said, because V2 code does exist (though unfinalized), there >> are >> clients / tools that are writing data in the un-finalized V2 format, as >> seems to be the case with Dremio. >> >> Now, as that comment you quoted said, you can have Spark write V2 files, >> but it's worth being mindful about the fact that V2 is a moving target and >> can (and likely will) change. You can overwrite parquet.writer.version to >> specify your desired version, but it can be dangerous to produce data in a >> moving-target format. For example, let's say you write a bunch of data in >> Parquet V2, and then the community decides to make a breaking change >> (which >> is completely fine / allowed since V2 isn't finalized). You are now left >> having to deal with a potentially large and complicated file format >> update. >> That's why it's not recommended to write files in parquet v2 just yet. >> >> >> >> <vinoo.gan...@gmail.com> >> >> >> On Wed, Apr 17, 2024 at 3:47 PM Prem Sahoo <prem.re...@gmail.com> wrote: >> >> > Hello Team, >> > I am working on different products such as Spark and Dremio. >> > >> > Dremio is able to write and read Parquet V2 and due this upgrade it is >> > working faster than Parquet V1 files. >> > >> > In case of spark it is still defaulting to Parquet V1 and when I >> > checked with Spark community they told me Parquet community isn't >> > recommending Parquet V2. >> > >> > "Prem, as I said earlier, v2 is not a finalized spec so you should not >> use >> > it. That's why it is not the default. You can get Spark to write v2 >> files, >> > but it isn't recommended by the Parquet community." >> > >> > please advise. >> > >> >