DIGY,
Yes, the problem is the cost of "throwing an exception" vs.
"returning a value".
The implication is that if you have to throw an exception to stop
the collection of results, as opposed to a method that doesn't exist (which
you indicated before, you wondered why it doesn't have some sort of way to
indicate "stop collecting") which should return a value to indicate that you
should stop collecting, then the design of the HitCollector interface is
**wrong**.
I definitely wouldn't suggest iterating through the entire result
set; while throwing an exception is the best selection with what's available
now, it doesn't mean it's correct.
- Nick
-----Original Message-----
From: Digy [mailto:[email protected]]
Sent: Thursday, January 14, 2010 5:16 PM
To: [email protected]
Subject: RE: at least one doc
The problem is not the cost of "throwing an exception" vs. "returning a
value".
If you pass a MaxDoc# to HitCollector, you have an option "not to read" the
docs from the index exceeding the Maxdoc#.
The costly part of this process is "reader.Document(doc)".
The problem arises if you have a millions of results: You have to iterate
all over the results although you want to have top MaxDox# docs.
The real performance problem is, which one is better:
an empty iteration of millions of times or an exception (say at 100 or 500)?
DIGY
From: Nicholas Paldino [.NET/C# MVP] [mailto:[email protected]]
Sent: Thursday, January 14, 2010 11:35 PM
To: [email protected]
Subject: RE: at least one doc
Neal,
With all due respect, you are wrong. Here is a program which
demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted
to 2.0 if necessary):
using System;
using System.Diagnostics;
namespace ExceptionTest
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("TryParse (no exception) (ticks/iteration):
{0:000.0000}", TestTryParse());
Console.WriteLine("Parse (exception) (ticks/iteration):
{0:000.0000}", TestParse());
}
const int Iterations = 10000;
static double Test(Func<bool> test)
{
// Do a garbage collection, make them both start
// from the same baseline.
GC.Collect();
// Create the stopwatch.
Stopwatch s = new Stopwatch();
s.Start();
// Iterate.
for (int index = 0; index < Iterations; index++)
{
// Perform the action.
test();
}
// Return the result. This is the elapsed ticks divided
// by the iterations.
return s.ElapsedTicks / (double) Iterations;
}
// Invalid parse string.
const string InvalidParseString = "10240223 This will not parse";
static double TestTryParse()
{
// Call the test method.
return Test(() =>
{
// The output value.
int val;
// Try and parse.
return Int32.TryParse(InvalidParseString, out val);
});
}
static double TestParse()
{
// Call the test method.
return Test(() =>
{
// The value.
int val;
// Wrap in a try catch.
try
{
// Parse.
val = Int32.Parse(InvalidParseString);
// Return true.
return true;
}
catch (Exception)
{
return false;
}
});
}
}
}
On my machine, using a return value over a caught exception takes
~000.5902 ticks per iteration. For throwing an exception and catching it,
it takes ~168.4368 ticks per iteration. The return value is about 285 times
faster, and you will see this on any machine (and qualifies the statement
"orders of magnitude" since it is at least two orders of magnitude faster).
Your experience probably tells you that in a one-off situation, it's
not noticeable, and you are right, it isn't (noticeable that is), but that
doesn't mean that it's not significantly faster than other options that are
available.
- Nick
-----Original Message-----
From: Granroth, Neal V. [mailto:[email protected]]
Sent: Thursday, January 14, 2010 3:37 PM
To: [email protected]
Subject: RE: at least one doc
Experience shows otherwise.
- Neal
-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:[email protected]]
Sent: Thursday, January 14, 2010 2:24 PM
To: [email protected]
Subject: RE: at least one doc
DIGY,
In .NET it's more than small. Throwing and catching an exception
(which is required here) is orders of magnitude slower than just returning a
value. It has to do with the stack unwind and restoration, and I'm sure
it's similar in Java.
- Nick
-----Original Message-----
From: Digy [mailto:[email protected]]
Sent: Thursday, January 14, 2010 2:44 PM
To: [email protected]
Subject: RE: at least one doc
I also have thought many times why HitCollector.Collect doesn't return a
boolean value indicating no more results are needed.
Maybe, a small performance increment.
DIGY
-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:[email protected]]
Sent: Thursday, January 14, 2010 8:09 PM
To: [email protected]
Subject: RE: at least one doc
Wow, that's just... Horrible from a design perspective. Doesn't
matter which language it's implemented in.
- Nick
-----Original Message-----
From: Digy [mailto:[email protected]]
Sent: Thursday, January 14, 2010 12:22 PM
To: [email protected]
Subject: RE: at least one doc
The formal way is throwing exception in the HitCollector.Collect to stop
iteration.
DIGY
-----Original Message-----
From: Artem Chereisky [mailto:[email protected]]
Sent: Thursday, January 14, 2010 1:16 AM
To: [email protected]; [email protected]
Subject: at least one doc
Hi,
Given a boolean query and/or a filter, what is the best way to see if there
is at least one matching document?
I tried a simple hit collector which sets a flag on the first Collect
method. Ideally I would want to stop collecting at that point but I couldn't
find a way of doing that.
I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it
seems to iterate through all matches as docs.totalHits is set the the actual
number of matches.
So, is there a better way
Regards,
Art
smime.p7s
Description: S/MIME cryptographic signature
