Re: contrib/benchmark questions

Grant Ingersoll Mon, 19 Mar 2007 12:10:43 -0800

Thanks for the reply, Doron. I knew this email was targeted for you,but thought it would be good to add to the user record.


On Mar 19, 2007, at 2:30 PM, Doron Cohen wrote:

Grant Ingersoll <[EMAIL PROTECTED]> wrote on 18/03/2007 10:16:14:

I'm using contrib/benchmark to do some tests for my ApacheCon talk
and have some questions.

1. In looking at micro-standard.alg, it seems like not all braces are
closed.  Is a line ending a separator too?

'>' can replace as a closing character (alternatively) either '}'or ']'

with the semantics: "do not collect/report separate statistics for the
contained tasks. See "Statistic recording elimination" in

http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/byTask/package-summary.html


So, if I am understanding correctly:

"SearchSameRdr" Search > : 5000

means don't collect indiv. stats fur SearchSameRdr, but do whateverthat task does 5000 times, right?

2. Is there anyway to dump out what params are supported by the
various tasks?  I am esp. uncertain on the Search related tasks.
Search related tasks do not take args. Perhaps the task shouldthrow an
exception if a params is set but not supported. I think I'll add that.
Currently only AdDoc, DeleteDoc and SetProp take args. The section"Command
parameter" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/byTask/package-summary.html
 which describes this is incomplete - I will fix it to reflect that.

Which query arguments do you have in mind?


Never mind, I was confused by the : XXXX parameters after the >

3. Is there anyway to dump out the stats as a CSV file or something?
Would I implement a Task for this?  Ultimately, I want to be able to
create a graph in Excel that shows tradeoffs between speed andmemory.
Yes, implementing a report task would be the way.
... but when I look at how I implemented these reports, all thework isdone in the class Points. Seems it should be modified a little withmore
thought of making it easiert to extend reports.


I may take a crack at it, but deadline for the talk is looming

4. Is there a way to set how many tabs occur between columns in the
final report?  They merge and buffer factors get hard to read for
larger values.
There's no general tabbing control, can be added if required, - butfor theautomatically added columns this is not requireed - just modify thename ofthe column and it would fit, e.g. use "merge:10:100" to get a 5charactrescolumn, or "merging:10:100" for 7, etc. (Also see "Index workparameters"
under "Benchmark properties" in
http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/byTask/package-summary.html
5. Below is my "alg" file, any tips?  What I am trying to do is show
the tradeoffs of merge factor and max buffered and how it relates to
memory and indexing time.  I want to process all the documents in the
Reuters benchmark collection, not the 2000 in the micro-standard.  I
don't want any pauses and for now I am happy doing things in serial.
I think it is doing what I want, but am not 100% certain.
Yes, it seems correct to me. What I usually do to verify a new algis torun it first with very small numbers - e.g. 10 instead of 22000,etc., and
examine the log. Few comments:
- you can specify a larger number than 22000 and the Docmaker williterate
and created new docs from same input again.
- Being intetested in memory stats - the thing that all the roundsrun in a
single program, same JVM run, usually means what you see is very much
dependent in the GC behavior of the specific VM you are using. Ifit doesnot release memory (most likely) to the OS you would not be able tonoticethat round i+1 used less memory than round i. It would probablybetter forsomething like this to put the "round" logic in an ant script,invokingeach round in a separate new exec. But then things get morecomplicated forhaving a final stats report containing all rounds. What do youthink about
this?


Good to know.  Perhaps a GarbageCollectionTask is needed?

- Seems you are only inrerested in the indexing performance, so youcan
remove (or comment out) the search part.
- If you are intrerested also in the search part, note that aswritten, thefour last search related tasks always use a new reader (opening/closing 950
readers in this test).

OK, search is the second part, just focused on indexing first.Trying to address common questions/issues people have withperformance in these two areas.


So, I should wrap those task in an OpenReader/CloseReader?

We may also want to consider making this an XML based typeconfiguration...

Thanks for your help. I will probably have a few more questions overthe next few days.


-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: contrib/benchmark questions

Reply via email to