Re: Distributed Fulltext?

David Axmark Wed, 13 Feb 2002 05:13:45 -0800

On Tue, 2002-02-12 at 15:38, Steve Rapaport wrote:
> David Axmark writes:
> 
> > So the standard answer with Apples and Oranges certainly apply here!
> 
> More like Äpplen och Apelsiner, that is, different but similar.  You Swedish 
> guys should know.  Thanks for answering, David, I appreciate the attention
> from a founder.
> 
> I also appreciate your point that Google is updating continuously and
> therefore not always caught up to the current state of the web.
> But isn't that a problem with indexing speed, not with search speed?
> Their search speed is still amazing, as is Altavista, and most of the
> other engines.
> 
> Your other point about exact vs. approximate answers is unclear, I expect
> that Google's answers are exact for their currently available indexes at any
> given time.  But even if they are approximate, I'd be happy with that too.  
> The scoring on a FULLTEXT search in Mysql is "exact" but based on a
> formula that is approximate anyway.


No, MySQL returns all data according to a search. Web engines return
what they they find on one search machine. So you can get different
results with Google every time you hit refresh if you are routed to
different machines. This had happened to me when I was looking for the
number of matches and not the result itself. 

So we should try to make fulltext searches with a limit between 10 and
100 be fast to be closer to google.

I have also head about some other things web search engines do since I
know some people at FAST but I have forgot that already.

> I'll summarize this thread best I can.
> 
> >From the math I used, we started with my estimate of 10^9,
> which was mistaken.  The real figure was 10^6, that is, Google
> searches fulltext about a million times faster than Mysql.
> Then we used Google's 10000 machines +DRAM indexing to reduce the
> gap to 10^2, or 100 times faster.  

I would say we should reduce it even further but that could be
discussed.

> It turns out that 100 times is about the factor that is causing
> my application problems.  If it just ran 100 times faster it would be
> about as fast as a regular indexed search, and I'd
> be happy.
> 
> A few people suggested that Mysql shouldn't try to be faster,
> I (and some high-support customers like Mike Wexler) disagreed.
> And Alex Aulbach, bless him, actually did his homework and showed that
> things could be much improved with smart index techniques like
> inverted files.

We will try to make every feature as good as possible. But we do have
limited resources.

> Then Sergei Golubchik wrote back to say he had taken some of the good ideas
> and inserted them into the TODO list, although he had higher priorities
> at the time.
> 
> And I was satisfied for now, although my application still isn't working
> satisfactorily due to a really slow and CPU-hungry FULLTEXT search.

Well there is always the option of sponsoring further fulltext
development. We have a guy who has been working on the GNU fulltext
engines who is interesting in working with MySQL fulltext. But for the
moment we can not afford it.

So if some of you are interested in sponsoring this (or know about
others who might be) write to [EMAIL PROTECTED]

> I think that's our story so far.
> 
> Steve Rapaport
> Director, Technical Services
> A-Tono


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Distributed Fulltext?

Reply via email to