youngfn commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381201481
> Do you have multiple CSV files or just one CSV file? -- Yes. My test table has 196 files, even though I've set the ARROW_IO_THREADS to 200, but it just runs with 5 reading threads(I print out the log inside hdfs read and count the thread number).  Unfortunately, I do not have an HDFS test environment locally so it is hard for me to test / profile HDFS. -- It's ok. I have the HDFS env, if you have ideas maybe I can do the test. I just want to figure out WHY can't I get the performance improvement by increasing the io thread and what should I do(what parameters should I change in code). You shouldn't need to do anything to get concurrent I/O. -- Yep, I know it should be. But right now the ARROW_IO_THREADS and OMP_NUM_THREADS just seem didn't work. Thank you for replying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
