Dear Poojit, Thanks for the email and detailed description, some thoughts below:
-----Original Message----- From: Poojit Sharma <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, November 24, 2014 at 10:20 PM To: "[email protected]" <[email protected]> Subject: Utilizing all available cores >Hi everyone, > >I am using OODT Radix v0.7 and need some help in fine tuning my system. >Let >me give you all an overview of my setup. > >I am crawling files in a directory using 'crawler' and ingesting it into >the 'file manager'. I have a PGE task setup which is triggered after >successful ingestion into the 'file manager'. The PGE Task then posts the >file to Solr. > >Everything works great but I would like to get the most of the available >resources. Currently, I am running this on c3.x8large AWS EC2 instance >which has 32 vCPUs. Since I have 2 million files, I have divided those >files into 32 folders and I am running 32 instances of 'crawler_launcher'. >When I monitor the system using 'htop' I don't see max CPU utilization. I >also notice in PCS Status via OPSUI, that a number of files are queued. Which files are queued? And what do you mean by Queued? PCS status shows current ingests? One thing you may want to do is expand the # of File Managers to achieve more throughput. For example, you can have 32 file managers running as well (probably too many, maybe something like # crawlers / 3, or ~10?). Seed these File Managers with the same config, but run them on different ports. > I >also tried to set org.apache.oodt.cas.workflow.engine.minPoolSize and >maxPoolSize to 32, as well as Solr's maxIndexingThreads to 32, but I think >there is some bottleneck. This depends on where you are running ingest. If it’s in the crawler and FM, then more FMs and crawlers will help. If you are ingesting from pipeline processing, more FMs will help (and crawler load is already distributed since CAS-PGE tasks are distributed). > >Is there an option to set number of threads of the 'file manager'? >Any help will be appreciated. See above - more file managers will help. Cheers, Chris > >Thanks, > >Poojit Sharma. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
