Benchmarks & Performance goals
------------------------------
Key: NUTCH-50
URL: http://issues.apache.org/jira/browse/NUTCH-50
Project: Nutch
Type: Task
Components: searcher
Environment: Linux, Windows
Reporter: byron miller
I am interested in developing a strategy and toolset used to benchmark nutch
search. Please give your feedback on the following approaches or
recommendations for setting standards and goals.
Example test case(s).
JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
1 million pages
-- single node --
JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
10 million pages
JDK 1.4.x 32 bit/Linux Platform
Single Node/2 gigs of memory
Single Index/Segment
10 million pages
-- dual node --
JDK 1.4.2 32 bit/Linux Platform
2 Node/2 gigs of memory
2 Indexes/Segments (1 per node)
1 million pages
JDK 1.4.2 32 bit/Linux Platform
2 Node/2 gigs of memory
2 Indexes/Segments (1 per node)
1 million pages
-- test queries --
* single term
* term AND term
* exact "small phrase"
* lang:en term
* term cluster
--- standards ----
10 results per page
---------------------
For me a testcase will help prove scalability, bottlenecks, application
environments, settings and such. The amount of customizations availble is
where we need to really look at setting the best base for X amount of documents
and some type of scalability scale. For example a 10 node system may only
scale x percent better for x reasons and x is the bottleneck for that scenerio.
Test cases would serve multiple purposes for returning performance, response
time and application stability.
Tools/possibilities:
* JMX components
* http://grinder.sourceforge.net/
* JMeter
* others???
---------------------
Query "stuffing" - use of dictionary that contains broad & vastly different
terms. Something that could be scripted as a "warm up" for production systems
as well. Possibly combine terms from our logs of common search queries to use
as a benchmark?
What is your feedback/ideas on building a good test case/stress testing
system/framework?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira