Yes, all your information is consumed . but how to differentiate between Parquet files written thru V2 or V1 , no one in the community has a clear idea about this which is a bit astonishing .
if any one is aware , it will be highly appreciated. On Thu, Apr 25, 2024 at 10:32 AM Gábor Szádovszky <ga...@apache.org> wrote: > I am not sure what "Parquet community V2 is not final yet" means. We are > now at parquet-format 2.10.0. The current parquet-mr supports most (if not > all) of its features. I agree the current mechanism in parquet-mr of > setting the writer version PARQUET_1_0 and PARQUET_2_0 is not > clear/misleading. We should work on this from format point of view as well. > BUT, there is no such thing as "finalizing" parquet-format V2. > > AFAIK Spark does support setting the writer version since it uses > parquet-mr. Try the hadoop configuration "parquet.writer.version" set to > "v2". Of course, it also supports reading these files by default. > > Prem Sahoo <prem.re...@gmail.com> ezt írta (időpont: 2024. ápr. 24., Sze, > 14:05): > > > Hello Gang/Team, > > Thanks for your reply. > > As per your suggestion there is none to differentiate if the Parquet is > > written thru V2 or V1 which is very confusing . > > We should have some flag or tag which differentiates Parquet written in > V1 > > or V2. While reading if the engine doesn't support V2 reading then we are > > sure we shouldn't feed V2 Parquet. > > > > Now few Tools/products are using Parquet V2 for both reading & writing > but* > > Apache Spark is not supporting write through V2 encoding as per Parquet > > community V2 is not final yet*. > > > > Do we have any date when the parquet-mr jar will have Parquet V2 writing > > functionality so that Spark can adhere to it. > > > > On Wed, Apr 24, 2024 at 1:28 AM Gang Wu <ust...@gmail.com> wrote: > > > > > As I have said in another thread, Parquet V2 is a concept which > contains > > > a lot of features. FWIW, what are defined in the specs [1] are > finalized > > > and > > > some of them have been implemented in various implementations. Any file > > > that contains one or more of those features can be considered v2 but > the > > > community has never defined a formal approach to distinguish between > > > v1 and v2. Parquet does have a field in the footer thrift definition to > > > mark > > > the file version [2]. However, not all implementations populate it > > > correctly and > > > some engines will even throw if the version is not 1. To avoid > > confusion, I > > > strongly suggest not using any v2 feature in your case unless you are > > 100% > > > sure that all your tools support the v2 feature set you have enabled. > > > > > > [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md > > > [2] > > > > > > > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1111 > > > > > > Best, > > > Gang > > > > > > On Wed, Apr 24, 2024 at 10:29 AM Prem Sahoo <prem.re...@gmail.com> > > wrote: > > > > > > > Any one please shed some light on this ? > > > > Sent from my iPhone > > > > > > > > > On Apr 23, 2024, at 4:30 PM, Prem Sahoo <prem.re...@gmail.com> > > wrote: > > > > > > > > > > Hello Team, > > > > > How to find out if the Parquet file is V1 or V2 ? > > > > > > > > > > Do we have any tag/identifier which can say a Parquet file is > created > > > > thru V2 or V1 ? > > > > > > > > > > Is there any specific properties need to be there then only that > > > parquet > > > > can be written in Parquet V2? > > > > > Sent from my iPhone > > > > > > > > > >