Hi Gabor, Thanks a lot for the quick reply. I've been looking into AvroParquetReader this morning and it looks like a much better fit for my problem:
ParquetReader<Object> pReader = AvroParquetReader.builder(localInputFile).build(); for (GenericData.Record value = (GenericData.Record) pReader.read(); value != null; value = (GenericData.Record) pReader.read()) { ... } Seems like it gets me data in the format I want. Regards, Ben On 2/11/20, Gabor Szadovszky <ga...@apache.org> wrote: > Hi Ben, > > SimpleRecord is pretty old and did not upgraded to a newer concept. You > need to extend ParquetReader for SimpleRecord. See AvroParquetReader for > details. > After you have to specified SimpleRecordReader and the related Builder you > can add the methods you need. > I also wanted to highlight that parquet-tools is not really for using it > from the code but from the command line. There are not proper unit tests > for that code and there are no guarantees for backward code compatibility > between the releases. I would not recommend using these code parts for > production. > > Cheers, > Gabor > > On Mon, Feb 10, 2020 at 9:55 PM Ben Watson <benwatson...@gmail.com> wrote: > >> Hello, >> >> I'm wanting to read Parquet records into JSON with Java, and it seems >> that >> JsonRecordFormatter is the way to do it (like in >> >> https://github.com/apache/parquet-mr/blob/master/parquet-tools/src/main/java/org/apache/parquet/tools/command/CatCommand.java#L84 >> ). >> >> Unlike the above example, I want to avoid passing a Hadoop Path object, >> and >> instead I want to use the ParquetReader.read(InputFile).build(); builder. >> However this returns a ParquetReader<Object>, and not the >> ParquetReader<SimpleRecord> that I need for JsonRecordFormatter. It looks >> like I need to insert a new SimpleReadSupport() somewhere, but I can't >> find >> any method in that builder that accepts it. >> >> I've tried looking for other usages online etc but haven't had any luck. >> Any pointers greatly appreciated. >> >> Regards, >> >> Ben >> >