wgtmac commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1322172984
> > > IMO, switching `ioThreadPool` and `processThreadPool` the reader instance level will make it more flexible. > > > > > > I've changed the thread pool so that it is not initialized by default but I left them as static members. Ideally, there should be a single IO thread pool that handles all the IO for a process and the size of the pool is determined by the bandwidthof the underlying storage system. Making them per instance is not an issue though. The calling code can decide to set the same thread pool for all instances and achieve the same result. Let me update this. > > Also, any changes you want to make are fine with me, and the help is certainly appreciated ! > > I'm thinking of merging the thread pools into a single `ioThreadPool` and making it settable thru `ParquetReadOptions` (like the allocator is). The work being done by the `processThreadPool` is rather small and maybe we can do away with it. Adding the pool via `ParquetReadOptions` makes it easier to use with `ParquetReader` (used a lot in unit tests). WDYT? Sorry for my late reply. Setting the thread pools via `ParquetReadOptions` is a good idea and that is exactly the way I want to do them away with static members. Merging `ioThreadPool` and `processThreadPool` into a single pool should work if the tasks in the `processThreadPool` do not wait for the return of tasks in the `ioThreadPool`. I will look into the detail later. BTW, I don't have the permission to directly update your PR in place as I am not yet a maintainer of the repo. I may need to open a new one by copying what you have done here and add you as a co-author. WDYT? If that sounds good to you, I can proceed. @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org