Looked into python GIL documents. It looks that even being with GIL, python multiple threads could possibly help some IO-bound (at least non-cpu-bound) tasks. We could limit the worker number be equal to min(16, queue entry number). Thanks.
2016-08-23 23:06 GMT+08:00 Kyle Dunn <[email protected]>: > Paul, > > This is a great finding! I think additional worker threads might make sense > when clusters are larger but otherwise a number like 8 is a safe bet, > especially given its positive impact on the user experience in the majority > of cases. > > +1 for tuning these down to improve latency. > > -Kyle > > On Tue, Aug 23, 2016, 08:31 Paul Guo <[email protected]> wrote: > > > Recently I noticed hawq-config seems to be slow, e.g. A simple guc > setting > > command line "hawq config -c lc_messages -v en_US.UTF-8" roughly costs 6+ > > seconds on my centos vm, but looking into the details of the command > line, > > I found this is really not expected. > > > > Quickly looked into the haws-config and python lib code, I found it looks > > like that several issues below affects the speed. > > > > 1) gpscp > > It still uses popen2.Popen4(). This function introduces millions of > useless > > close() sys call finally in above test command. Using > > subprocess.Popen() without close_fds as an alternative easily resolve > > this. > > > > 2) gppylib/commands/base.py > > > > def __init__(self,name,pool,timeout=5): > > > > The worker thread will block at most 5 seconds in each loop (Queue.get()) > > to fetch potential commands even we have known that there will be no more > > commands to run for some threads. This really does not make sense since > > some idle threads will block for 5 seconds also before exiting. > > > > Setting timeout to zero will make python code spin. I tested a small > > timeout value e.g. 0.1s and it works fine. It seems that 0.1 is a good > > timeout candidate. > > > > 3) gppylib/commands/base.py > > > > def __init__(self,numWorkers=16,items=None): > > > > WorkerPool by default creates 16 threads but to my knowledge, cpython's > > Thread does not work fine due to the global GIL lock. I'm not an python > > expert so I'm wondering whether less thread number (e.g. 8) is really > > enough? Either from theory or from practice (e.g.previous test results). > > > -- > *Kyle Dunn | Data Engineering | Pivotal* > Direct: 303.905.3171 <3039053171> | Email: [email protected] >
