Thanks for the reply, Doron. I knew this email was targeted for you, but thought it would be good to add to the user record.

On Mar 19, 2007, at 2:30 PM, Doron Cohen wrote:

Grant Ingersoll <[EMAIL PROTECTED]> wrote on 18/03/2007 10:16:14:

I'm using contrib/benchmark to do some tests for my ApacheCon talk
and have some questions.

1. In looking at micro-standard.alg, it seems like not all braces are
closed.  Is a line ending a separator too?

'>' can replace as a closing character (alternatively) either '}' or ']'
with the semantics: "do not collect/report separate statistics for the
contained tasks. See "Statistic recording elimination" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ byTask/package-summary.html

So, if I am understanding correctly:

"SearchSameRdr" Search > : 5000

means don't collect indiv. stats fur SearchSameRdr, but do whatever that task does 5000 times, right?



2. Is there anyway to dump out what params are supported by the
various tasks?  I am esp. uncertain on the Search related tasks.

Search related tasks do not take args. Perhaps the task should throw an
exception if a params is set but not supported. I think I'll add that.
Currently only AdDoc, DeleteDoc and SetProp take args. The section "Command
parameter" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ byTask/package-summary.html
 which describes this is incomplete - I will fix it to reflect that.

Which query arguments do you have in mind?

Never mind, I was confused by the : XXXX parameters after the >


3. Is there anyway to dump out the stats as a CSV file or something?
Would I implement a Task for this?  Ultimately, I want to be able to
create a graph in Excel that shows tradeoffs between speed and memory.

Yes, implementing a report task would be the way.
... but when I look at how I implemented these reports, all the work is done in the class Points. Seems it should be modified a little with more
thought of making it easiert to extend reports.

I may take a crack at it, but deadline for the talk is looming



4. Is there a way to set how many tabs occur between columns in the
final report?  They merge and buffer factors get hard to read for
larger values.

There's no general tabbing control, can be added if required, - but for the automatically added columns this is not requireed - just modify the name of the column and it would fit, e.g. use "merge:10:100" to get a 5 charactres column, or "merging:10:100" for 7, etc. (Also see "Index work parameters"
under "Benchmark properties" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ byTask/package-summary.html

5. Below is my "alg" file, any tips?  What I am trying to do is show
the tradeoffs of merge factor and max buffered and how it relates to
memory and indexing time.  I want to process all the documents in the
Reuters benchmark collection, not the 2000 in the micro-standard.  I
don't want any pauses and for now I am happy doing things in serial.
I think it is doing what I want, but am not 100% certain.


Yes, it seems correct to me. What I usually do to verify a new alg is to run it first with very small numbers - e.g. 10 instead of 22000, etc., and
examine the log. Few comments:
- you can specify a larger number than 22000 and the Docmaker will iterate
and created new docs from same input again.
- Being intetested in memory stats - the thing that all the rounds run in a
single program, same JVM run, usually means what you see is very much
dependent in the GC behavior of the specific VM you are using. If it does not release memory (most likely) to the OS you would not be able to notice that round i+1 used less memory than round i. It would probably better for something like this to put the "round" logic in an ant script, invoking each round in a separate new exec. But then things get more complicated for having a final stats report containing all rounds. What do you think about
this?

Good to know.  Perhaps a GarbageCollectionTask is needed?


- Seems you are only inrerested in the indexing performance, so you can
remove (or comment out) the search part.
- If you are intrerested also in the search part, note that as written, the four last search related tasks always use a new reader (opening/ closing 950
readers in this test).

OK, search is the second part, just focused on indexing first. Trying to address common questions/issues people have with performance in these two areas.

So, I should wrap those task in an OpenReader/CloseReader?

We may also want to consider making this an XML based type configuration...

Thanks for your help. I will probably have a few more questions over the next few days.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to