Hello Team,
Do you have any clue in which version of parquet-mr jar Parquet V2 encoding
code  is available ?

On Sun, Apr 21, 2024 at 6:21 PM Prem Sahoo <prem.re...@gmail.com> wrote:

> Thanks Vinoo for the valuable information .
>
> On Sat, Apr 20, 2024 at 5:07 PM Vinoo Ganesh <vinoo.gan...@gmail.com>
> wrote:
>
>> Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet V2
>> as a standard isn't finalized just yet. Meaning there is no formal,
>> *finalized* "contract" that specifies what it means to write data in the
>> V2
>> version. The discussions/conversations about what the final V2 standard
>> may
>> be are still in progress and are evolving.
>>
>> That being said, because V2 code does exist (though unfinalized), there
>> are
>> clients / tools that are writing data in the un-finalized V2 format, as
>> seems to be the case with Dremio.
>>
>> Now, as that comment you quoted said, you can have Spark write V2 files,
>> but it's worth being mindful about the fact that V2 is a moving target and
>> can (and likely will) change. You can overwrite parquet.writer.version to
>> specify your desired version, but it can be dangerous to produce data in a
>> moving-target format. For example, let's say you write a bunch of data in
>> Parquet V2, and then the community decides to make a breaking change
>> (which
>> is completely fine / allowed since V2 isn't finalized). You are now left
>> having to deal with a potentially large and complicated file format
>> update.
>> That's why it's not recommended to write files in parquet v2 just yet.
>>
>>
>>
>> <vinoo.gan...@gmail.com>
>>
>>
>> On Wed, Apr 17, 2024 at 3:47 PM Prem Sahoo <prem.re...@gmail.com> wrote:
>>
>> > Hello Team,
>> > I am working on different products such as Spark and Dremio.
>> >
>> > Dremio is able to write and read Parquet V2 and due this upgrade it is
>> > working faster than Parquet V1 files.
>> >
>> > In case of spark it is still defaulting to Parquet V1 and when I
>> > checked with Spark community they told me Parquet community isn't
>> > recommending Parquet V2.
>> >
>> > "Prem, as I said earlier, v2 is not a finalized spec so you should not
>> use
>> > it. That's why it is not the default. You can get Spark to write v2
>> files,
>> > but it isn't recommended by the Parquet community."
>> >
>> > please advise.
>> >
>>
>

Reply via email to