Hi everyone, I am using OODT Radix v0.7 and need some help in fine tuning my system. Let me give you all an overview of my setup.
I am crawling files in a directory using 'crawler' and ingesting it into the 'file manager'. I have a PGE task setup which is triggered after successful ingestion into the 'file manager'. The PGE Task then posts the file to Solr. Everything works great but I would like to get the most of the available resources. Currently, I am running this on c3.x8large AWS EC2 instance which has 32 vCPUs. Since I have 2 million files, I have divided those files into 32 folders and I am running 32 instances of 'crawler_launcher'. When I monitor the system using 'htop' I don't see max CPU utilization. I also notice in PCS Status via OPSUI, that a number of files are queued. I also tried to set org.apache.oodt.cas.workflow.engine.minPoolSize and maxPoolSize to 32, as well as Solr's maxIndexingThreads to 32, but I think there is some bottleneck. Is there an option to set number of threads of the 'file manager'? Any help will be appreciated. Thanks, Poojit Sharma.
