[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

GitBox Thu, 17 Nov 2022 16:55:56 -0800


parthchandra commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1319411064


   > > IMO, switching `ioThreadPool` and `processThreadPool` the reader 
instance level will make it more flexible.
   > 
   > I've changed the thread pool so that it is not initialized by default but 
I left them as static members. Ideally, there should be a single IO thread pool 
that handles all the IO for a process and the size of the pool is determined by 
the bandwidthof the underlying storage system. Making them per instance is not 
an issue though. The calling code can decide to set the same thread pool for 
all instances and achieve the same result. Let me update this.
   > 
   > Also, any changes you want to make are fine with me, and the help is 
certainly appreciated !
   
   I'm thinking of merging the thread pools into a single ioThreadPool and 
making the ioThreadPool settable thru `ParquetReadOptions` (like the allocator 
is). The work being done by the processThreadPool is rather small and maybe we 
can do away with it. 
   Adding the pool via `ParquetReadOptions`  makes it easier to use with 
`ParquetReader` (used a lot in unit tests).
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

Reply via email to