That's not parquet itself, but how the databases structure their tables the classic Hive directory structure can be scanned with a tree walk
Path dir = new Path(url-to-directory-here) FileSystem fs = FileSystem.get(dir.toUri(), new Configuration() List<FileStatus> statuses = fs.listStatus(dir) // recurse down each dir where status is a directory If you just want to do a list of actual files, then fs.listFiles(dir, true) give you them exclusively, and maximum performance on both HDFS and AWS S3. Iceberg tables are a different issue; ask on the users mailing list there. On Sat, 7 Jun 2025 at 11:34, Selim S <mk1853...@gmail.com> wrote: > Hello Micah, > > Thanks, actually I want to extract the partition fields and values from a > directory structure given a *org.apache.hadoop.conf.Configuration*; > and a *dirPath > *(string) where the parquet partitions reside. > And I want to do that using Java API. What is the best, most efficient > approach to do that? > > Thanks. > Regards. > > Le sam. 7 juin 2025 à 06:53, Micah Kornfield <emkornfi...@gmail.com> a > écrit : > > > Hi Selim, > > The Parquet file itself does not have a notion of partitioning. > Similarly, > > parquet itself does not store a last modification date. Could you expand > > on your use case for what you are trying to accomplish? > > > > Thanks, > > Micah > > > > On Wed, May 28, 2025 at 10:43 AM Selim S <mk1853...@gmail.com> wrote: > > > > > Hello - I would like to ask how to detect the partition fields and the > > > partition values of a partitioned Parquet file using the Java API? > > > > > > How to get last modification date of a Parquet file using the same API? > > > > > > Thank you. Best regards. > > > > > >