I don't think the multi-threading should be on the level of Parquet. But you could write on a different thread. For example, when one of the 1500 writers is ready to write, you could do this on a different thread.
Cheers, Fokko Op za 19 okt. 2019 om 12:16 schreef Manik Singla <[email protected]>: > Thanks Fokko for response and correcting me on the way I addressed > > We are using parquet using our internal framework where we usually have > dynamic schema. Due to dynamic schema, we do some buffering to figure out > schema for current writer. > We open around 1500 writers at a time but not able to achieve throughput at > times when one particular schema is making most of data. > Though we can handle that by creating multiple writers by identifying such > schema, I was thinking if we can increase throughput by having > multi-threaded support. > > For sure, it will increase locking if we implement concurrent access but > leave users carefree. > > > > Regards > Manik Singla > +91-9996008893 > +91-9665639677 > > "Life doesn't consist in holding good cards but playing those you hold > well." > > > On Thu, Oct 17, 2019 at 7:54 PM Driesprong, Fokko <[email protected]> > wrote: > > > Thank you for your question Manik, > > > > First of all, I think most of the people working on this project are > guys, > > but I would not exclude any other gender. > > > > Secondly. Parquet is widely used in different open source project such as > > Hive, Presto and Spark. These frameworks scale-out by design. For > example, > > Spark writes by default 200 files to the persistent store. I think > > multi-threading (or multi-processing) should be implemented at such a > > level. For example, you could write multiple parquet files from your > > application. Having multiple threads writing to the same thread would not > > make too much sense to me. Please let me know your thoughts on how you > see > > multi-threading within Parquet. > > > > Cheers, Fokko > > > > > > > > Op di 15 okt. 2019 om 11:45 schreef Manik Singla <[email protected]>: > > > > > Hi Guys > > > > > > I was looking for tasks list or blockers which are required to support > > > multi-threaded writer( java specifically). > > > I did not find anything in JIRA or forums. > > > > > > Could someone help me to point some doc/link if exists > > > > > > > > > Regards > > > Manik Singla > > > +91-9996008893 > > > +91-9665639677 > > > > > > "Life doesn't consist in holding good cards but playing those you hold > > > well." > > > > > >
