Hi, Min The operation of StorageManagerV2 is as follows. The ScanSchedulercoordinates read requests for each disk. That is, when it receives a number of read requests, it first finds the DiskFileScanScheduler who is assigned the minimum number of read requests. After that, it assigns a read request to the found DiskFileScanScheduler. This process is repeated for remaining read requests. DiskFileScanScheduler creates FileScanRunners for every assigned request. FileScanRunner just reads data by a fixed size of buffer. You can see the related issue at https://issues.apache.org/jira/browse/TAJO-178 and this figure<https://issues.apache.org/jira/secure/attachment/12602567/tajo_storage_manager.png>will help you understand.
Although StorageManagerV2 is designed to accelerate the read performance by scheduling disk scans, its performance was not up to our expectations. As you said, its thread model is too complex, and it might degrade the performance. So, StorageManager is mainly used instead of StorageManagerV2. (StorageManager is used by default). Thanks, Jihoon 2014-02-01 Min Zhou <[email protected]>: > Hi all, > > Seems the thread model of tajo storage layer is quite complex. > Each call of StorageManagerFactory.getStorageManager(TajoConf) creates > one instance of StorageManagerV2, which creates a scan scheduler thread > and several disk file scan schedulers threads. Why those threads are > needed? What's their function? How do those threads work with file > scanners? > > > Regards, > Min > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
