Benchmarks & Performance goals
------------------------------

         Key: NUTCH-50
         URL: http://issues.apache.org/jira/browse/NUTCH-50
     Project: Nutch
        Type: Task
  Components: searcher  
 Environment: Linux, Windows
    Reporter: byron miller


I am interested in developing a strategy and toolset used to benchmark nutch 
search.  Please give your feedback on the following approaches or 
recommendations for setting standards and goals.

Example test case(s).

JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
1 million pages  

-- single node --

JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
10 million pages

JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
10 million pages

-- dual node --

JDK 1.4.2 32 bit/Linux Platform
2 Node/2 gigs of memory
2 Indexes/Segments (1 per node)
1 million pages

JDK 1.4.2 32 bit/Linux Platform
2 Node/2 gigs of memory
2 Indexes/Segments (1 per node)
1 million pages


-- test queries --

* single term
* term AND term
* exact "small phrase"
* lang:en term
* term cluster

--- standards ----

10 results per page


---------------------

For me a testcase will help prove scalability, bottlenecks, application 
environments, settings and such.  The amount of customizations availble is 
where we need to really look at setting the best base for X amount of documents 
and some type of scalability scale.  For example a 10 node system may only 
scale x percent better for x reasons and x is the bottleneck for that scenerio.

Test cases would serve multiple purposes for returning performance, response 
time and application stability. 

Tools/possibilities:

* JMX components
* http://grinder.sourceforge.net/
* JMeter
* others???

---------------------

Query "stuffing" - use of dictionary that contains broad & vastly different 
terms. Something that could be scripted as a "warm up" for production systems 
as well.  Possibly combine terms from our logs of common search queries to use 
as a benchmark?

What is your feedback/ideas on building a good test case/stress testing 
system/framework?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to