[GitHub] [arrow] youngfn commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

GitBox Thu, 12 Jan 2023 17:41:05 -0800


youngfn commented on issue #33627:
URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381201481


   > 
   Do you have multiple CSV files or just one CSV file? -- 
   Yes. My test table has 196 files, even though I've set the ARROW_IO_THREADS 
to 200, but it just runs with 5 reading threads(I print out the log inside hdfs 
read and count the thread number).
   ![hdfs read 
log](https://user-images.githubusercontent.com/10483852/212216366-3638b1c1-9fa8-4b5e-9d51-a85319019996.png)
   
   Unfortunately, I do not have an HDFS test environment locally so it is hard 
for me to test / profile HDFS. --
   It's ok. I have the HDFS env, if you have ideas maybe I can do the test. 
   I just want to figure out WHY can't I get the performance improvement by 
increasing the io thread and what should I do(what parameters should I change 
in code).
   
   You shouldn't need to do anything to get concurrent I/O. --
   Yep, I know it should be. But right now the ARROW_IO_THREADS  and 
OMP_NUM_THREADS just seem didn't work.
   
   Thank you for replying.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] youngfn commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

Reply via email to