Hello Folks, Probably a repeat, so my apologies in advance.
Is there any appetite for a Parquet 2.0? In my mind, the greatest need is to cut the dependency on Hadoop and allow simply for the Parquet file format to exists on its own. I was recently considering a project by which a light-weight stand-alone application can exist that reads Iceberg Tables (Parquet) data. My use case includes a lot of readers on slow-moving data. Essentially a mini HBase-like client that can read data either from S3 or a local file system. Anyway, I started putting together a quick PoC and forgot that I needed to carry with me so very many Hadoop JARs (and their dependencies). I also hit a snack trying to test on a Windows work laptop because the hadoop file IO librarians require some sort of specialized binary support shims. So, the main goal of version 2 would be to develop Parquet library as a stand-alone pure Java framework and the other packages (e.g., hadoop, protobuf, etc.) would be offered as additional extensions. So the package structure would be something like: - parquet-api (InputSource, ParquetReader, ParquetWriter, etc) - parquet-core (the actual parquet framework) - parquet-hadoop (e g., Simple InputSource Implementation, Splitters, etc.) Thanks.