I don't think the multi-threading should be on the level of Parquet. But
you could write on a different thread. For example, when one of the 1500
writers is ready to write, you could do this on a different thread.

Cheers, Fokko

Op za 19 okt. 2019 om 12:16 schreef Manik Singla <[email protected]>:

> Thanks Fokko for response and correcting me on the way I addressed
>
> We are using parquet using our internal framework where we usually have
> dynamic schema.  Due to dynamic schema, we do some buffering to figure out
> schema for current writer.
> We open around 1500 writers at a time but not able to achieve throughput at
> times when one particular schema is making most of data.
> Though we can handle that by creating multiple writers by identifying such
> schema,  I was thinking if we can increase throughput by having
> multi-threaded support.
>
> For sure, it will increase locking if we implement concurrent access but
> leave users carefree.
>
>
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>
>
> On Thu, Oct 17, 2019 at 7:54 PM Driesprong, Fokko <[email protected]>
> wrote:
>
> > Thank you for your question Manik,
> >
> > First of all, I think most of the people working on this project are
> guys,
> > but I would not exclude any other gender.
> >
> > Secondly. Parquet is widely used in different open source project such as
> > Hive, Presto and Spark. These frameworks scale-out by design. For
> example,
> > Spark writes by default 200 files to the persistent store. I think
> > multi-threading (or multi-processing) should be implemented at such a
> > level. For example, you could write multiple parquet files from your
> > application. Having multiple threads writing to the same thread would not
> > make too much sense to me. Please let me know your thoughts on how you
> see
> > multi-threading within Parquet.
> >
> > Cheers, Fokko
> >
> >
> >
> > Op di 15 okt. 2019 om 11:45 schreef Manik Singla <[email protected]>:
> >
> > > Hi Guys
> > >
> > > I was looking for tasks list or blockers which are required to support
> > > multi-threaded writer( java specifically).
> > > I did not find anything in JIRA or forums.
> > >
> > > Could someone help me to point some doc/link if exists
> > >
> > >
> > > Regards
> > > Manik Singla
> > > +91-9996008893
> > > +91-9665639677
> > >
> > > "Life doesn't consist in holding good cards but playing those you hold
> > > well."
> > >
> >
>

Reply via email to