That's not parquet itself, but how the databases structure their tables

the classic Hive directory structure can be scanned with a tree walk

Path dir = new Path(url-to-directory-here)
FileSystem fs = FileSystem.get(dir.toUri(), new Configuration()

List<FileStatus> statuses = fs.listStatus(dir)

// recurse down each dir where status is a directory

If you just want to do a list of actual files, then fs.listFiles(dir, true)
give you them exclusively, and maximum performance on both HDFS and AWS S3.

Iceberg tables are a different issue; ask on the users mailing list there.



On Sat, 7 Jun 2025 at 11:34, Selim S <mk1853...@gmail.com> wrote:

> Hello Micah,
>
> Thanks, actually I want to extract the partition fields and values from a
> directory structure given a *org.apache.hadoop.conf.Configuration*;
> and a *dirPath
> *(string) where the parquet partitions reside.
> And I want to do that using Java API. What is the best, most efficient
> approach to do that?
>
> Thanks.
> Regards.
>
> Le sam. 7 juin 2025 à 06:53, Micah Kornfield <emkornfi...@gmail.com> a
> écrit :
>
> > Hi Selim,
> > The Parquet file itself does not have a notion of partitioning.
> Similarly,
> > parquet itself does not store a last modification date.  Could you expand
> > on your use case for what you are trying to accomplish?
> >
> > Thanks,
> > Micah
> >
> > On Wed, May 28, 2025 at 10:43 AM Selim S <mk1853...@gmail.com> wrote:
> >
> > > Hello - I would like to ask how to detect the partition fields and the
> > > partition  values of a partitioned Parquet file using the Java API?
> > >
> > > How to get last modification date of a Parquet file using the same API?
> > >
> > > Thank you. Best regards.
> > >
> >
>

Reply via email to