Also, using the API is a pain, because you have to use Hadoop. Various people have found work-arounds for this, such as: Comments on: https://issues.apache.org/jira/browse/PARQUET-1822
I also assembled a minimal reader myself (from code I found elsewhere on github, which I should add attributions for later) which I put here: https://github.com/theosib-amazon/parquet-mr-minreader On 4/25/22, 2:51 PM, "gamaken k" <[email protected]> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > wiki on how to use the api +1 to this. I too think this would be very useful for getting started. Xinyu, you could potentially look at parquet-cli's source code to understand how it invokes the various APIs from parquet-mr, I think. On Sun, Apr 24, 2022 at 8:29 AM Xinyu Zeng <[email protected]> wrote: > Hi, > > I am a previous user of parquet-cpp(now integrated with arrow) and now > I am going to use the java version parquet-mr. However, I did not find > any doc or wiki on how to use the api. I am also interested in > contributing but there is also no contribution guide like other open > source projects. I would appreciate it if someone could give me a > short guide. > > Thanks, > Xinyu >
