<http://canonical.org/~kragen/search-comparison-2009.html>
Some guy from ask.com just made the totally implausible claim that their search results are “just as good if not better” than Google’s, and their search engine also had another advantage: they were willing to put paid advertising someplace Google wouldn’t (specifically, on searches about abortion). So I thought I would do a comparison. Here are the last ten Google queries from my browser history: 1. [morning-after pill] 2. [len tower lawnmower] 3. [melting point of solder] 4. [melting point of silicon] 5. [david phillip oster] 6. [1998 blogs], with a drill-down to [1998 weblogs] and [history of weblogs] 7. [emacs tags file syntax], with drill-down to [emacs tags table syntax] and [site:www.gnu.org emacs tags table syntax] 8. [Eric Stoltz] 9. [cytocomputer] 10. [zHosting Ltd] I evaluated them on Google, Ask.com, Yahoo Search, and Bing. I more or less have ads turned off with AdBlock Plus and NoScript, and I’m viewing everything in Firefox 3.0 with Gnash for my Flash player. So there may be annoyances that affect other people but not me. Summary ------- So here are the grades for the different queries: [morning-after pill] Grades: Google **B**, Ask.com **B-**, Yahoo Search **D**, Bing **F**. [len tower lawnmower] Grades: Google **A**, Ask.com **A**, Yahoo Search **A+**, Bing **A**. [melting point of solder] Grades: Google **A**, Ask.com **B**, Yahoo Search **A+**, Bing **C**. [melting point of silicon] Grades: Google **A+**, Ask.com **A**, Yahoo Search **D**, Bing **C**. [david phillip oster] Grades: Google **C**, Ask.com **B**, Yahoo Search **A+**, Bing **B**. [1998 blogs] Grades: Google **D**, Ask.com **D**, Yahoo Search **B**, Bing **C**. [emacs tags file syntax] Grades: Google **F**, Ask.com **F**, Yahoo Search **F**, Bing **F**. [Eric Stoltz] Grades: Google **A**, Ask.com **C**, Yahoo Search **B**, Bing **A+**. [cytocomputer] Grades: Google **B**, Ask.com **D**, Yahoo Search **D**, Bing **F**. [zHosting Ltd.] Grades: Google **A**, Ask.com **A+**, Yahoo Search **A**, Bing **B**. **Google**’s median grade is **A- or B+**, the best of the four. It only failed on a query where all four search engines failed. However, it was only the best search engine of the four **30%** of the time. It was clearly better than the others on dealing with a controversial topic and providing search results from beyond the Web: books and academic papers. **Ask.com**’s median grade is **B**. It, too, only failed on the query where all four search engines failed. Its results were worse than Google’s 50% of the time, equally good 30% of the time, and better than Google’s 20% of the time. So the claim by the guy from Ask.com isn’t as implausible as it appeared at first, but it still isn’t true for my query mix. It was only the best search engine of the four **10%** of the time. I’m really surprised at how well Ask.com did, because I always thought of their search engine as a joke. **Yahoo Search**’s median grade is **B**. It, too, only failed on a query where all four search engines failed. It was the best search engine of the four **40%** of the time, more than any other search engine, so I am going to switch to it as my default search engine. It was better than Ask.com less often than Google, though: it was better 40% of the time, equally good 20% of the time, and worse 30% of the time. **Bing**’s median grade is **C**, the worst of any engine, and unlike any other engine, it failed badly on two of the nine queries the other search engines were able to answer: in one case by privileging misinformation and scaremongering over reliable information, and in a second case by simply failing to find anything relevant. It was the best search engine of the four only **10%** of the time, like Ask; that was on a celebrity query. I’m sad to say this because my friend Barney Pell has been working really hard on it for years, but Bing’s performance is pathetic. (The percentages of “best of the four” 30% + 10% + 40% + 10% add up to only 90%; that’s because one of the ten queries was failed by all four search engines, and in that case none was “the best”.) So there isn’t really a clear winner; Yahoo Search, Google, and Ask.com are pretty even overall, even though some did much better than others on particular queries. There is a clear *loser*, though: Bing. Maybe I should have included Cuil to make Bing look better. I mean, I feel kind of bad. (Actually, I did try [morning-after pill] and [david phillip oster] on Cuil. It did better than Bing.) The rest of this document (4000 words) is taken up with explanations of the particular queries. [morning-after pill] -------------------- Here I wanted to see if I could find accurate information about emergency contraception without having to cope with abortion-scare sites providing misinformation. Google: * hit 1 is Wikipedia: ideal; explains both sides of the debate objectively, along with lots of detailed information. * hit 2 is morningafterpill.org, an abortion-scare site: not so good. However, the snippet says, “Site asserts that “morning after” emergency contraception is just another abortion approach that kills a human life.”, so it’s not a surprise shock. * hit 3 is morningafterpill.org also, with health-scare information which is not actually accurate. Not good. * hit 4 is news results, saying, “Legal fight continues on sale of “morning after” pill”. * hit 5 is some UK site with what appears to be accurate information. Ask.com: * hit 1 is something on healthline.com, with apparently accurate information and an unhelpful blurry IUD diagram. * hit 2 is Google hit #5. * hit 3 is Google hit #2. Not good. * hit 4 is getthepill.com, apparently an online OTC pharmacy for morning-after pills. * hit 5 is Google hit #1, Wikipedia. Yahoo Search: * provides lots of drop-down suggestions before I even finish typing the search query! * hit 1 is Google hit #2, with the more misleading snippet, “Rejects ideas that the Morning After Pill is not an abortifacient and argues instead that MAP use is tantamount to abortion. From the American Life League.”. Very bad. * hit 2 is Google hit #3. * hit 3 is Google hit #1, Wikipedia. * hit 4 is Google hit #4. * hit 5 is a Mayo Clinic page. Bing: * hit 1 is Google hit #2, very bad. * hit 2 is a dictionary definition. Worthless. * hit 3 is the Mayo Clinic page. * hit 4 is another page from morningafterpill.org, but with no visual indication that it’s the same site or that it doesn’t have reliable information. Very bad. * hit 5 is from sexuality.about.com. Similar to the Wikipedia page, but shorter, except that it doesn’t cover the controversy at all. Grades on this query: Google B, Ask.com B-, Yahoo Search D, Bing F. [len tower lawnmower] --------------------- I wanted to find a photo of Len Tower on a human-powered riding mower that I had seen a few days ago. Google: hit #1 is a page with the photo and background information, instantly recognizable as such. Ask.com: same. Yahoo Search: same, but hits #2 and #3 are also about it, with more information. Bing: same as Google. Grades on this query: Google A, Ask.com A¸ Yahoo Search A+, Bing A. [melting point of solder] ------------------------- I wanted to find out the melting point of traditional eutectic lead-tin solder as well as the melting point of common modern RoHS-compliant solders. Google: * hit 1 is Wikipedia page for “Solder”, which is a very general page with a uselessly large range in the snippet. * hit 2 is Wikipedia page for “Soldering”. * hit 3 is “RF Cafe - Solder Properties Melting Point”, with the snippet “These values are for some of the most common solders...”, so I clicked on that. It has precise melting points for a rather larger number of solders than I wanted, but I got the information I needed. Ask.com: * hit 1 is some journal article from 1996 about a new solder formulation that, as far as I know, nobody uses today. Useless. * hit 2 is Google hit #1. * hit 3 is Google hit #2. * hit 4 is “EPE “Basic Soldering Guide””, which says in the snippet, “The melting point of most solder is in the region of 188°C (370°F) and the iron tip temperature is typically 330-350°C (626°-662°F). The latest lead-free solders typically require a higher temperature.”. You would think this was better, but if you follow the link to the (rather large) page, it never actually tells you what the higher temperature is. * hit 5 is Google hit #3. Yahoo Search says, “Did you mean: melting point of soldier?” * hit 1 is Google hit #1, with a nice little graphic. * hit 2 is Google hit #2, with the same nice little graphic. * hit 3 is somebody asking a question about what kind of solder was in common use in 1969, and how hot it melts. * hit 4 is some sort of “tips on soldering” page. * hit 5 is Ask.com hit #1. * hit 7 actually looks promising, but has only questions but no answers. So I followed the Wikipedia link, and it has the answer for eutectic lead-tin solder above the fold and a section on “lead-free solders” with a whole big discussion of which ones are most common and what their melting points are. So I probably should have followed that link from Google instead of hit #3. Bing: * hit 1 is a brand new US patent on a type of solder. Trash. * hit 2 is another one. Trash. * hit 3 is ask.com hit #1. Trash. * hit 4 is a 2007 article by Zhenhua Chen about lower-temperature lead-free solder formulations that aren’t yet in wide use, also summarizing the melting points of the widely-used modern solders and their various advantages and disadvantages. Pure gold. (The article, metaphorically speaking, not the solders.) * hit 5 is the Wikipedia page. Grades on this query: Google A, Ask.com B, Yahoo Search A+, Bing C — would be an F except for hit 4. [melting point of silicon] -------------------------- Google has the answer in big letters above the search results: 1687 K. Wikipedia article is hit #2, and the correct answer in °C is in hit #4. Ask.com has the answer in the snippets for hits 1, 2, and slightly wrong answers in snippets for hit 4 and hit 5, and hit #3 presumably has it if I click through. Yahoo Search hit 1 is Wikipedia. Hits 2 and 3 are the wrong answer. Snippets for hits 4 and 5 have the right answer. Bing: * hit 1 is a patent. Trash. * hit 2 is about the melting point of silicon dioxide. Trash. * hit 3 is roughly a duplicate of hit 2. Trash. * hit 4 has the answer in a snippet from another Wikipedia page. * hit 5 is another irrelevant thing about SiO₂. Grades: Google A+, Ask.com A, Yahoo Search D, Bing C. [david phillip oster] --------------------- I wanted to find his home page, thence to find his current email address, to email him. Google: no home page, but hits 5-7 look vaguely promising. Hit 5 leads to a blog post that links to <http://groups.google.com/groups/search?q=%22david+phillip+oster%22&start=0&scoring=d>, which does actually link to <http://groups.google.com/group/iphonesdkdevelopment/browse_thread/thread/5c9cd5561d7b0d64/da37b38ede21148d?q=%22david+phillip+oster%22#da37b38ede21148d> which links to <http://groups.google.com/groups/profile?enc_user=szRVXBsAAABguGT__oukXrijYyXRsYeu3jKajrjPH-s4VDv7fhNHSg>, which says “davidphillipos...@gmail.com”, which is close enough. Hit 7, his Amazon reviewer page, actually has “os...@ieee.org” on the page. Hit 9 links to a RISKS page that gives the email address he had in 1988. In practice I gave up when I saw the page of snippets; instead I searched my email. Ask.com: hit #2 is Google’s hit #7. Yahoo Search: turbozen.com is hits #1 and #2, with “os...@ieee.org” in both snippets. Hit #4 is mosaiccodes.com, which links to turbozen.com. Bing: hit #3 is Yahoo hit #2 (without the email address in the snippet, but clear that it’s his software company), and hit #5 is Google hit #7. Grades: Google C, Ask.com B, Yahoo Search A+, Bing B. [1998 blogs] ------------ I was trying to remember the state of the blogosphere in 1998 when I started kragen-tol in order to justify my claim that it wasn’t very surprising that I didn’t start it as a blog. Google: top ten hits are all trash — things that happen to be a blog or mention blogs and mention 1998. Hit #11 looks more promising but is also trash. Somewhere around hit #20 there’s [Psychology of Blogs (Weblogs)](http://psychcentral.com/blogs/blog.htm), from 1998, which is a pretty good snapshot of how things were in 1998 — except a little bit polluted by a 2001 update. Ask.com: same trash as Google, except only ten hits of it. (I have Google set to display 100.) Yahoo Search: mostly the same trash, but Psychology of Blogs is hit #4. Yahoo Search used to display 20 hits by default, but now it seems it’s down to 10, just like Google. Bing: hit #1 talks about what the web was like in 1998, in Spanish, but doesn’t shed any light on my actual question, which is what the blogosphere was like in 1998. Hit #2 is the Spanish Wikipedia page for “blog”, which has a pretty good “Historia” section. Hit #7 is somebody’s presentation on SlideShare, which loses pretty badly (not accessible without Flash and fails freakishly in Gnash) but there’s some good information in the title. None of these really gave me what I was looking for, which was Rebecca Blood’s “History of Weblogs” from 2000, which I couldn’t remember the title of. So when I was doing this search “for real”, the first time, instead of looking at hit #20 or trying multiple search engines, I glanced at the page full of trash and reformulated my search. The word “blog” wouldn’t be coined until 1999 (by The Brand Peter Me.) and at the time they were called “weblog”, a term Jorn Barger had invented in 1997 for what are now called “linklogs” or sometimes “microblogs” or “tumblelogs”. So I searched for [1998 weblogs]. On Google, “Psychology of Weblogs” is hit #1, and Jason Kottke’s blog archives for 1998 are hit #3. The snippet for hit #6, from a blog I’d never heard of that ended in 2005, says, “I started this weblog in August 1998, when it was one of the first 25 or so weblogs in existence,” which is a piece of the information I was looking for but not the comprehensive overview of Wikipedia or Blood’s piece. Ask.com is essentially identical to Google, with the same hits #1 and #3, and Google’s hit #6 moved up to #4. However, it also has a sidebar of “Related Searches”, which includes a suggestion for “history of weblogs”. Yahoo Search has “Psychology of Weblogs” as hit #1, but also has Blood’s essay as hit #8! Also, hit #4 is “Computer History for 1998”, with some minimal information. Hit #9 mentions that Scripting News’s comments section started in October 1998, and hit #10 is “Jorn Barger, the NewsPage Network, and the Emergence of the Weblog Community”, which offers a somewhat deeper history even than Blood’s essay. Bing gives essentially exactly the same results as for [1998 blogs]. So, since I was using Google instead of Yahoo Search, I searched a third time for [history of weblogs]. On Google, below the Google Scholar hits, which don’t have enough information on the page to tell me if they’re the right thing, Blood’s article is #1. English Wikipedia articles are the next couple of hits, followed by more articles about the early history of weblogs (1997-2000). Pure gold. Ask.com gives basically the same results. Yahoo Search puts Blood’s article at the top, a self-promotional post short on detail by Dave Winer, the Wikipedia article, etc. Bing gives Blood’s essay at the top, followed by a Spanish Wikipedia article, some random irrelevant stuff, a German page (which I don’t understand), some more irrelevant stuff, and what appears to be an SEO spam page (“Interested in history? At weblogs.hu you find posts and information relevant to history. www.weblogs.hu/posts/tags/history”.) So, grades: Google D, Ask.com D, Yahoo Search B, Bing C. On my earlier queries Yahoo Search does dramatically better than the others, well enough that I wouldn’t have proceeded to the third query and maybe not past the first. [emacs tags file syntax] ------------------------ I wanted to look up the syntax of Emacs `TAGS` files so I could write a program to generate one (introspectively from the state of a Python program, rather than by parsing a bunch of source code). This search originally was completely unsuccessful, although I’m not totally stymied; there is one free-software consumer of `TAGS` and two free-software generators of `TAGS` already on my machine, so I can just look at the source. If I’m lucky, it will reference a file format spec. Google: all of the hits relate to how to invoke `etags`, which generates `TAGS` files, or how to use them in Emacs. The “syntax” being referenced is invariably the syntax of the source files, not of `TAGS` itself (which is called a “tags table”, apparently.) Most of them are a zillion copies of the Emacs manual and the man pages for `etags` and Exuberant Ctags. Ask.com: identically useless results, except for a bunch of irrelevant “Related Searches” at the top. Yahoo Search: same. Bing: same. My next attempt was to be more specific in my query: I’m looking for information about the *tags table*. In retrospect, I should have looked for information about the “file format”, not “syntax”, but my next search was [emacs tags table syntax]. All four search engines give basically the same results as before. So my next attempt was to click on “more results from www.gnu.org »”, with the thought that this would give me each section of the Emacs manual only once, and many more of them. It did, on Google, but the Emacs manual does not contain the answer. I am not trying the query on the other search engines. Searching for [emacs tags table format] does not seem to help. I thought I would try using natural-language search on Ask.com and Bing. [how do i generate an emacs tags table?] on Ask.com yields mostly `etags` man pages, but also a link to <http://www.emacswiki.org/cgi-bin/wiki/EmacsTags>, which doesn’t help but is usually a better resource than the Emacs manual. Bing has it at the top. Grades: Google F, Ask.com F, Yahoo Search F, Bing F. [Eric Stoltz] ------------- I had read that Eric Stoltz had been originally cast in Back To The Future, and I wondered who he was. Google gave me four photos of him at the top, which was sufficient for me to know I didn’t recognize him. Hit #1 was his IMDB page and hit #3 is the Wikipedia page, which outlined his acting career in sufficient detail to satisfy me. Ask.com has a bunch of irrelevant “related searches” at the top, followed by product images from Amazon which are too small to see the guy’s face. Then there’s the IMDB page, some TV listings for ZIP code 10010 in the US (utterly pathetic; I’m in Argentina), and then a Wikipedia page with a too-small image. Yahoo Search has only three photos, of smaller size than Google’s, but they’re recognizable. Top few hits are from IMDB and Wikipedia. Bing has six photos, including a closeup shot, which are highly recognizable. Then the top hit is some other guy Eric Stoltz who’s a web designer, followed by Wikipedia entries from English and Spanish, an IMDB page, and then a French Wikipedia article. Grades: Google A, Ask.com C, Yahoo Search B, Bing A+. [cytocomputer] -------------- I wanted to know what had been written recently about Bob Lougheed et al.’s image processing device. Google: * hit 1: The abstract of Lougheed and McCubbrey’s 1980 paper, without the full text. Fail. * hit 2: Some paper from 1982 that referenced it, also without the full text. Fail. * hit 3: Thesaurus.com. Not just fail but spam; thesaurus.com (an ask.com service) is wasting Google users’ time by directing them to a page that says, “No results found for *cytocomputer*: Did you mean strumpet?” No, I certainly did not. * hit 4: a 2001 book on Google Books about image processing, describing the Cytocomputer architecture in the context of image processing architectures of the time. OK. * hit 5: another book on Google Books, this one from 1993. * hit 6: from IEEE Xplore: an abstract, without the text, of a paper from 2001 that referenced it. Fail. * hit 7: also from IEEE Xplore: The same paper as hit 2, again without the text, and also with the wrong title. Fail. * hit 8: a spam page from reference.com (an ask.com service), saying, “No results found for *cytocomputer*: Did you mean supercomputer (in dictionary) or Cart computer (in reference)?” * hit 9: the full text of the paper from hit 6, which turns out to be a 2001 emulation of the Cytocomputer in an FPGA, getting a 10× speedup over the software emulation they had been using. This is the version of the paper that was submitted to the government sponsors and thenceforth freely disseminated. MADE OF WIN. * hit 10: the full text of Barry Bruce Megdal’s 1983 dissertation on VLSI fingerprint recognition. WIN. Particularly impressive since the PDF contains no text; it’s scanned from prints. Later Google hits include crap from linkinghub.elsevier.com, expired US patents describing the Cytocomputer in some detail, and so on. So even though 60% of the top 10 Google hits are basically spam (duplicate teasers from ACM and IEEE, and ask.com SEO spam pages) there’s some good stuff in there. Also, Google offers “Cited by 57” on the original Cytocomputer paper. Among other things, that links me to the Cheops paper from 1995 and the 400-page Image Algebra book from 1986. These only mention the Cytocomputer in passing, but they look pretty interesting. Ask.com: * hit 1: Google hit #1. Fail. * hit 2: Google hit #2. Fail. * hit 3: Google hit #6. FAIL. * hit 4: Google hit #7. FAIL. * hit 5: Google hit #9. MADE OF WIN. * hit 6: Google hit #16, one of the patents. OK. * hit 7: crap from linkinghub. Fail. * hit 8: a European Cytocomputer patent, probably a dupe of one of the US patents. OK. * hit 9: a DBLP conference proceedings page for ISCA 1980, which included the paper that is hit #1 and Ask.com hit #3. OK. * hit 10: some crap from ingentaconnect that offers to sell you Google hit #9 for US$47.00 plus tax. FAIL. So Ask’s first ten results are almost indistinguishable from Google’s, except: 1. They’re 90% garbage instead of 60%; 2. They omit the spam pages produced by Ask.com properties like reference.com and thesaurus.com; 3. They don’t have Google Books hits (naturally); 4. As a result of lacking Google Books and spam from Ask.com, hit #9 (the jackpot) moves up to hit #5. Yahoo Search: * hit 1: some PDF from gaianxaos.com. It’s 9MB, so I clicked the “view as HTML” link, which didn’t work. * hit 2: apparently the same PDF from quantumconsciousness.org, which makes me suspect that the paper is written by a nutcase. It turns out to be a 272-page book that seems mostly sane but is primarily concerned with the nature of consciousness, and therefore is somewhat speculative. It mentions the word “cytocomputer” once in the title of Chapter 5 but never explains what it means in the text. * hit 3: an IEEE page without the full text of some paper about CLIP7A. * hits 4 and 5: HTML and PDF versions of the Cheops paper I got off Google Scholar. * hit 6: a blog comment I made last year about unusual computing hardware, which might be interesting to anybody interested in the Cytocomputer, except me. * hit 7: Chip Morningstar’s resume. He worked on software for the Cytocomputer in the early 1980s. * hit 8: a copy of one of some paper citing the Cytocomputer that somebody uploaded to “docstoc”, maybe the Image Algebra book. Page has broken Flash on it, offers to let me download the document if I register. * hit 9: Ask.com hit #9. * hit 10: A mailing list post of mine from 2005. So Yahoo Search found a lot of interesting stuff, but it’s marginally related to the Cytocomputer. I guess I should be flattered that two things I wrote are in the top 10, but I’m more frustrated than flattered. The most relevant items — the US patent and the 2001 Cytocomputer emulation in an FPGA — are missing entirely. Bing: * hit 1: a variant of Google hit #1 but with a useless snippet and two-word title. FAIL. * hit 2: citations for the 1980 paper from Citeseer. Citeseer finds 10 to Google Scholar’s 57, but they’re 10 that it’s guaranteed to have downloadable copies of. Unfortunately none of them look like they say anything interesting about the Cytocomputer. Fail. * hit 3: some teaser page from IEEE Xplore. FAIL. * hit 4: something from CiteSeer with no title or author; turns out to be a 100-page chunk from the middle of some book on image processing; I think it’s the “Image Algebra” book I got from Google Scholar. FAIL. * hit 5: The Cheops paper via CiteSeer. OK. * hit 6: A 1988 ERIM paper on a use of the Cytocomputer with a Symbolics 3600 for machine vision for automated orbital navigation. OK. * hit 7: Yahoo Search hit #10. * hit 8: The Cheops paper, not via CiteSeer. OK. * hit 9: some teaser page from ACM. FAIL. * hit 10: Ask.com’s hit #9, the DBLP page. So Bing basically gave me none of what I want. Grades: Google B, Ask.com D, Yahoo Search D, Bing F. I wish I could give Ask.com an F for spamming Google’s search results, but that wouldn’t accurately represent the quality of their own search results, which is at issue here. If they get successful enough at it, I guess I’ll have to stop using Google, after all. [zHosting Ltd.] --------------- Charlie Stross wrote about his attempt to start up a virtual Linux hosting company on an IBM mainframe in 2000. Before I got to the part where the company folded before even getting angel funding, I searched to see what the company was up to now. So “success” in this search would be a clear statement that the company had folded without customers or revenue. On Google, hit 4 is Charlie’s story of the company. None of the other top 10 or 20 hits suggest that zHosting Ltd. of the UK has ever existed. This is somewhat confused by some guy who uses “zHosting” as his screen name when posting on webmaster-oriented forums, including some that are related to virtualization. Ask.com has Charlie’s story as hit 2. Yahoo Search doesn’t have Charlie’s story, but its hit #1 is from checksure.biz, which lists a zHosting Ltd. at 54 Easter Road, Edinburgh, Midlothian EH7 5RQ. I’m pretty sure that’s Charlie’s company. It offers to sell me a “report” on the company for £9.95. I’m not sure whether I should treat this as a spectacular success (I got the incorporation address of a company that folded in 2000 and never had a customer!) or a failure to filter spam (somebody tried to charge me US$15 for a “report” on a company that folded in 2000 and never had a customer!) Bing doesn’t have Charlie’s story or anything interesting, just the guy who posts on web forums. Grades: Google A, Ask.com A+, Yahoo Search A, Bing B. <link rel="stylesheet" href="http://canonical.org/~kragen/style.css" /> -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol