In our case,  all 1500 writers have different schema, so we need to
increase throughput per writer.
But currently, writers throughput is not application bottleneck.

As per suggestion, We will look at application level fixes if we come to
this.



Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."


On Mon, Oct 21, 2019 at 9:26 PM Ryan Blue <[email protected]> wrote:

> I agree with Fokko. Multi-threading is not the responsibility of Parquet.
> You can parallelize by writing more Parquet files in separate threads.
> Adding locks to Parquet doesn't make much sense and is unlikely to speed up
> your application without huge changes to Parquet.
>
> On Mon, Oct 21, 2019 at 12:14 AM Driesprong, Fokko <[email protected]>
> wrote:
>
> >  I don't think the multi-threading should be on the level of Parquet. But
> > you could write on a different thread. For example, when one of the 1500
> > writers is ready to write, you could do this on a different thread.
> >
> > Cheers, Fokko
> >
> > Op za 19 okt. 2019 om 12:16 schreef Manik Singla <[email protected]>:
> >
> > > Thanks Fokko for response and correcting me on the way I addressed
> > >
> > > We are using parquet using our internal framework where we usually have
> > > dynamic schema.  Due to dynamic schema, we do some buffering to figure
> > out
> > > schema for current writer.
> > > We open around 1500 writers at a time but not able to achieve
> throughput
> > at
> > > times when one particular schema is making most of data.
> > > Though we can handle that by creating multiple writers by identifying
> > such
> > > schema,  I was thinking if we can increase throughput by having
> > > multi-threaded support.
> > >
> > > For sure, it will increase locking if we implement concurrent access
> but
> > > leave users carefree.
> > >
> > >
> > >
> > > Regards
> > > Manik Singla
> > > +91-9996008893
> > > +91-9665639677
> > >
> > > "Life doesn't consist in holding good cards but playing those you hold
> > > well."
> > >
> > >
> > > On Thu, Oct 17, 2019 at 7:54 PM Driesprong, Fokko <[email protected]
> >
> > > wrote:
> > >
> > > > Thank you for your question Manik,
> > > >
> > > > First of all, I think most of the people working on this project are
> > > guys,
> > > > but I would not exclude any other gender.
> > > >
> > > > Secondly. Parquet is widely used in different open source project
> such
> > as
> > > > Hive, Presto and Spark. These frameworks scale-out by design. For
> > > example,
> > > > Spark writes by default 200 files to the persistent store. I think
> > > > multi-threading (or multi-processing) should be implemented at such a
> > > > level. For example, you could write multiple parquet files from your
> > > > application. Having multiple threads writing to the same thread would
> > not
> > > > make too much sense to me. Please let me know your thoughts on how
> you
> > > see
> > > > multi-threading within Parquet.
> > > >
> > > > Cheers, Fokko
> > > >
> > > >
> > > >
> > > > Op di 15 okt. 2019 om 11:45 schreef Manik Singla <
> [email protected]
> > >:
> > > >
> > > > > Hi Guys
> > > > >
> > > > > I was looking for tasks list or blockers which are required to
> > support
> > > > > multi-threaded writer( java specifically).
> > > > > I did not find anything in JIRA or forums.
> > > > >
> > > > > Could someone help me to point some doc/link if exists
> > > > >
> > > > >
> > > > > Regards
> > > > > Manik Singla
> > > > > +91-9996008893
> > > > > +91-9665639677
> > > > >
> > > > > "Life doesn't consist in holding good cards but playing those you
> > hold
> > > > > well."
> > > > >
> > > >
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to