Hi Raghvendra, Quick sidebar.. Please subscribe to the mailing list, so your message get published automatically. :)
On Thu, Feb 13, 2020 at 5:32 PM Raghvendra Dhar Dubey <[email protected]> wrote: > Hi Udit, > > Thanks for information. > Actually I am struggling on following points > 1 - How can we process S3 parquet files(hourly partitioned) through Apache > Hudi? Is there any streaming layer we need to introduce? 2 - Is there any > workaround to query Hudi Dataset from Athena? we are thinking to dump > resulting Hudi dataset to S3, and then querying from Athena. 3 - What > should be the parquet file size and row group size for better performance > on querying Hudi Dataset? > > Thanks > Raghvendra > > > On Thu, Feb 13, 2020 at 5:05 AM Mehrotra, Udit <[email protected]> wrote: > > > Hi Raghvendra, > > > > You would have to re-write you Parquet Dataset in Hudi format. Here are > > the links you can follow to get started: > > > > > https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html > > https://hudi.apache.org/docs/querying_data.html#spark-incr-pull > > > > Thanks, > > Udit > > > > On 2/12/20, 10:27 AM, "Raghvendra Dhar Dubey" > > <[email protected]> wrote: > > > > Hi Team, > > > > I want to setup incremental view of my AWS S3 parquet data through > > Apache > > Hudi, and want to query this data through Athena, but currently > Athena > > not > > supporting Hudi Dataset. > > > > so there are few questions which I want to understand here > > > > 1 - How to stream s3 parquet file to Hudi dataset running on EMR. > > > > 2 - How to query Hudi Dataset running on EMR > > > > Please help me to understand this. > > > > Thanks > > > > Raghvendra > > > > > > >
