Re: contrib/benchmark questions

Grant Ingersoll Thu, 22 Mar 2007 19:21:32 -0800

OK, Doron (and other benchmarkers!), on to search:

Here's my alg file:


#Indexing declaration up here

OpenReader
    { "SrchSameRdr" Search > : 5000

    { "SrchTrvSameRdr" SearchTrav > : 5000
    { "SrchTrvSameRdrTopTen" SearchTrav(10) > : 5000
    { "SrchTrvRetLoadAllSameRdr" SearchTravRet > : 5000

#Skip bytes and body

{ "SrchTrvRetLoadSomeSameRdr" SearchTravRetLoadFieldSelector(docid,docname,docdate,doctitle) > : 5000

    CloseReader

Never mind the last task, I will be submitting a patch shortly thatwill make sense out of it. Essentially, it specifies what fields toload for the document


Here are the    results:

Operation round mergemax.buffered runCnt recsPerRun rec/s elapsedSecavgUsedMem avgTotalMem[java] OpenReader - - - - - - - - 0 - 10 - - - 10- - 1 - - - - 1 - - 125.0 - - 0.01 - 5,385,600 - -9,965,568[java] SrchSameRdr_5000 0 1010 1 5000 1,184.3 4.22 5,805,1209,965,568[java] SrchTrvSameRdr_5000 - - - - - 0 - 10 - - - 10- - 1 - - 427500 - 71,776.4 - - 5.96 - 5,806,144 - -9,965,568[java] SrchTrvSameRdrTopTen_5000 0 1010 1 427500 62,001.4 6.89 5,766,5849,965,568[java] SrchTrvRetLoadAllSameRdr_5000 - - 0 - 10 - - - 10- - 1 - - 850000 - - 7,226.4 - - 117.62 - 6,161,728 - -9,965,568[java] SrchTrvRetLoadSomeSameRdr_5000 0 1010 1 850000 10,334.0 82.25 6,162,7529,965,568[java] CloseReader - - - - - - - - 0 - 10 - - - 10- - 1 - - - - 1 - - 1,000.0 - - 0.00 - 5,921,856 - -9,965,568


The line I'm a bit confused by is the recsPerRun

For the tasks that are doing the traversal and the retrieval, why somany recsPerRun? Is it counting the hits, the traversals and theretrievals each as one record?


What I am trying to do is compare:
Search
Search plus traversal of all hits
Search plus traversal of top ten

Search plus traversal and retrieval of all documents and all fieldson the documentSearch plus traversal and retrieval of all documents and some fieldson the document

I think I see in the ReadTask that it is the res var that is beingincremented and would have to be altered. I guess I can go byelapsed time, but even that seems slightly askew. I think this isdue to the withRetrieve() function overhead inside the for loop. Ihave moved it out and will submit that change, too.


Am I interpreting this correctly?

-Grant

On Mar 19, 2007, at 5:11 PM, Doron Cohen wrote:

Grant Ingersoll <[EMAIL PROTECTED]> wrote on 19/03/2007 13:10:16:

So, if I am understanding correctly:

"SearchSameRdr" Search > : 5000


means don't collect indiv. stats fur SearchSameRdr, but do whatever
that task does 5000 times, right?


Almost...

It should be btw
   { "SearchSameRdr" Search > : 5000

and it means: run Search 5000 times, sequentially, 5000 times,assign the

name "SearchSameRdr" to that sequence of 5000, and do not collect
individual stats for the individual tasks making that sequence.

If it was just
  { Search > : 5000

it would still mean the same, just that a name was assigned to thisfor

you, something like: "Seq_Search_5000".

If it was:
   { "SearchSameRdr" Search } : 5000

it would be the same as your example, just that stas would becollected notonly for the entire elapsed sequence, but also breaking it down foreach of

the 5000 calls to "Search".

Similar logic with
  [ .. ]
and
  [ .. >
just that the tasks making the (parallel) sequence are executed in
parallel, each in a separate thread.

3. Is there anyway to dump out the stats as a CSV file orsomething?Would I implement a Task for this? Ultimately, I want to beable to
create a graph in Excel that shows tradeoffs between speed and
memory.


Yes, implementing a report task would be the way.
... but when I look at how I implemented these reports, all the
work is
done in the class Points. Seems it should be modified a little with
more
thought of making it easiert to extend reports.


I may take a crack at it, but deadline for the talk is looming


I'll take a look too, let you know if I have anything.

- Being intetested in memory stats - the thing that all the rounds
run in a

single program, same JVM run, usually means what you see is verymuch

dependent in the GC behavior of the specific VM you are using. If
it does
not release memory (most likely) to the OS you would not be able to
notice
that round i+1 used less memory than round i. It would probably
better for
something like this to put the "round" logic in an ant script,
invoking
each round in a separate new exec. But then things get more
complicated for
having a final stats report containing all rounds. What do you
think about
this?


Good to know.  Perhaps a GarbageCollectionTask is needed?


ResetSystemSoft and ResetSystemErase both call GC;
Is this sufficient, task wise?

The concern is that this is not enough gc/mem wise, because the JVMalready

has some memory, that the OS is not going to reclaim.

So, I should wrap those task in an OpenReader/CloseReader?


Yes, if you want the same reader object to be used by all these.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: contrib/benchmark questions

Reply via email to