Thanks for the reply, Doron. I knew this email was targeted for you,
but thought it would be good to add to the user record.
On Mar 19, 2007, at 2:30 PM, Doron Cohen wrote:
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 18/03/2007 10:16:14:
I'm using contrib/benchmark to do some tests for my ApacheCon talk
and have some questions.
1. In looking at micro-standard.alg, it seems like not all braces are
closed. Is a line ending a separator too?
'>' can replace as a closing character (alternatively) either '}'
or ']'
with the semantics: "do not collect/report separate statistics for the
contained tasks. See "Statistic recording elimination" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/
byTask/package-summary.html
So, if I am understanding correctly:
"SearchSameRdr" Search > : 5000
means don't collect indiv. stats fur SearchSameRdr, but do whatever
that task does 5000 times, right?
2. Is there anyway to dump out what params are supported by the
various tasks? I am esp. uncertain on the Search related tasks.
Search related tasks do not take args. Perhaps the task should
throw an
exception if a params is set but not supported. I think I'll add that.
Currently only AdDoc, DeleteDoc and SetProp take args. The section
"Command
parameter" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/
byTask/package-summary.html
which describes this is incomplete - I will fix it to reflect that.
Which query arguments do you have in mind?
Never mind, I was confused by the : XXXX parameters after the >
3. Is there anyway to dump out the stats as a CSV file or something?
Would I implement a Task for this? Ultimately, I want to be able to
create a graph in Excel that shows tradeoffs between speed and
memory.
Yes, implementing a report task would be the way.
... but when I look at how I implemented these reports, all the
work is
done in the class Points. Seems it should be modified a little with
more
thought of making it easiert to extend reports.
I may take a crack at it, but deadline for the talk is looming
4. Is there a way to set how many tabs occur between columns in the
final report? They merge and buffer factors get hard to read for
larger values.
There's no general tabbing control, can be added if required, - but
for the
automatically added columns this is not requireed - just modify the
name of
the column and it would fit, e.g. use "merge:10:100" to get a 5
charactres
column, or "merging:10:100" for 7, etc. (Also see "Index work
parameters"
under "Benchmark properties" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/
byTask/package-summary.html
5. Below is my "alg" file, any tips? What I am trying to do is show
the tradeoffs of merge factor and max buffered and how it relates to
memory and indexing time. I want to process all the documents in the
Reuters benchmark collection, not the 2000 in the micro-standard. I
don't want any pauses and for now I am happy doing things in serial.
I think it is doing what I want, but am not 100% certain.
Yes, it seems correct to me. What I usually do to verify a new alg
is to
run it first with very small numbers - e.g. 10 instead of 22000,
etc., and
examine the log. Few comments:
- you can specify a larger number than 22000 and the Docmaker will
iterate
and created new docs from same input again.
- Being intetested in memory stats - the thing that all the rounds
run in a
single program, same JVM run, usually means what you see is very much
dependent in the GC behavior of the specific VM you are using. If
it does
not release memory (most likely) to the OS you would not be able to
notice
that round i+1 used less memory than round i. It would probably
better for
something like this to put the "round" logic in an ant script,
invoking
each round in a separate new exec. But then things get more
complicated for
having a final stats report containing all rounds. What do you
think about
this?
Good to know. Perhaps a GarbageCollectionTask is needed?
- Seems you are only inrerested in the indexing performance, so you
can
remove (or comment out) the search part.
- If you are intrerested also in the search part, note that as
written, the
four last search related tasks always use a new reader (opening/
closing 950
readers in this test).
OK, search is the second part, just focused on indexing first.
Trying to address common questions/issues people have with
performance in these two areas.
So, I should wrap those task in an OpenReader/CloseReader?
We may also want to consider making this an XML based type
configuration...
Thanks for your help. I will probably have a few more questions over
the next few days.
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]