OLPC's Dictionary -- A Software Review

StarDict has been chosen by the OmegaWiki and OLPC projects as the program to display and facilitate querying of dictionary data. Over the last few weeks, I have taken a look at the documentation and played with StarDict and compared it with ZBEDic (with which I have begun to be very impressed recently). This comparison has led me to believe that both programs should be ported to Sugar (or combined into a new OLPC dictionary) and that both programs have different strengths and weaknesses. A comparison of these programs is, therefore, certain to benefit both OLPC and Free Software users everywhere.

StarDict -- An Overview

StarDict is, in typical *nix fashion, a very versitile and useful program. It supports Dict.org style dictionaries and can load and search multiple dictionaries simultaneously and can display definitions from many dictionaries simultaneously.

It is fast. Even with ten or more dictionaries loaded, there appears to be no degradation in response time. Amazingly, as well, it has a "scan" function which defines any word highlighted with the mouse. This is the closest thing I have yet seen on any platform to Dr. Eye (one of the few proprietary software programs I would truly claim is awesome. It has a feature that automagically defines -- in two languages! -- any text on the desktop that is moused over in popup windows).

StarDict's Drawbacks

Given all that, you are probably downloading and installing StarDict as you read this. However, StarDict is also classically *nix in its glaring weaknesses.

Philosophy

Part of this has to do with the Dict.org philosophy which leans toward an always connected mentality, as opposed to the on again off again mindset imposed by the early internet. Unfortunately, the OLPC is most likely going to operate in the latter environment -- or, at the very least, must be designed to. Dict.org is designed for experienced system administrators, certainly not for children. The dictionary format that StarDict uses seems to require at least the completion of an undergraduate comp-sci degree to implement.  So much for kids adding or creating their own dictionaries. If this were not enough, there is no graphical or even user level method to add dictionaries.

No Included Dictionaries

No dictionaries -- not even Public Domain ones -- come with the StarDict packages, and there are no Debian packages of dictionaries. As far as I know, there are no Fedora/Redhat packages, either.  Adding dictionaries requires root access to the system. This really is an absurd requirement for a childrens' computer -- it is even absurd for a desktop computer (in fact this requirement was so onerous that until a month ago, I have given up every one of many attempts to use this software over the last few years).

Website

The StarDict website is also nearly impossible to navigate. I had to use an anonymizer just to get Google to let me complete the search and show me the dictionary list (For some reason, clicking the dictionaries link brings up a page with a Google search. Searching causes Google to accuse the user of being a bot and doing something illicit.  Maybe it is just my connection, but why should I have to use Google to find a page of dictionaries within the site I was already visiting?).

Documentation

StarDict's documentation is not exactly comprehensive. For compiling dictionaries, there appear to be exactly two text files: one describing the StarDict dictionary format in programmerese, and another containing terse tips on compiling dictionaries. Both are practically unreadable to a non-programmer. This is to say nothing of documentation on how to use the program itself -- the program is of course quite complicated and not always intuitive.

Failure to Read Archives

In addition to the above problems, downloaded dictionaries have to be unzipped to work. Each archive contains a folder with three files. StarDict can read the dictzip compressed archive (.dict.dz), but not .tar.bz2 archive containing all three files. I find it absurd that a program that requires compression cannot read from a larger archive containing three files. This extra step is probably assumed to be easy for seasoned system administrators but is hardly so for ordinary users -- especially ones that are in the process of learning.  The .tar.bz2 archives should not have to be extracted, or if compression will slow down the program too much, an uncompressed archive (i.e. just .tar or something similar) should be used since the dictionaries are compressed anyway.

Ugly Display

StarDict's tooltip and windowed definition presentation is cludgy. Instead of treating paragraphs as a lump, it renders lines individually indenting new lines giving the impression of random placement -- some are indented, some are not. This is not only ugly but also makes reading quite difficult, especially in the tooltip display.

Selection Problems

The scan feature requires text to be highlighted carefully by hand and does not automagically use spaces to separate words. As a result, StarDict often attempts to look up word fragments or even multiple word fragments. As a word is being highlighted, it attempts to define almost every group of letters until the word is completed. In this way, the program's speed actually works against it making highlighting more difficult. Then again, this feature also makes phrase lookups possible.

No Non-English Dictionaries

For some reason, there are no non-English dictionaries on the list at Stardict's site. There appear to be many text based dictionary files in other languages on other parts of the web, but they are sprinkled on many different sites and hard to find. These files need to be consolidated so that they can easily be found without having to weed through thousands of dictionary websites and Amazon like sales entries.

ZBEDic -- An Overview

ZBEDic is a really cool dictionary program for the Zaurus. It is small, fast and integrates well with programs like FBReader (which has been ported to Sugar). FBReader has a feature that allows automagic lookups of any word pressed on the touch screen. Clicking on a special icon returns one to the book being read. It supports multiple dictionaries, and the definitions are nicely formatted. Further, it uses a human readable dictionary format. The dictionary files can be stored anywhere and added with the included file manager functionality (one need not be root to add dictionaries). ZBEDic's definition presentation is pretty and very readable. It not only displays paragraphs nicely but also separates multiple definitions with blank lines so the reader does not have to do extra work mentally separating the elements. It also displays redundant definitions of a single word.

ZBEDic's Drawbacks

While ZBEDic can access a collection of dictionaries, it can only use one at a time. This means the only way to have multiple languages is to compile a dictionary that way or to have two dictionaries.

No Automagic Swapping or Dictionary Combining

Dictionaries must be swapped by hand. There is no reverse lookup within a dictionary. In other words, checking the words contained in a definition in another language or dictionary requires changing dictionaries.

Unidirectional Bilingual Limitation

These two points also make the dictionary unidirectional. An English-Spanish dictionary will not do Spanish-English lookups. All of these problems could, in theory, be solved in the dictionary files (i.e. by compiling a dictionary that contained both English-Spanish and Spanish-English entries and probably interpolated with entries from both languages mixed freely and alphabetically), but such a solution would be difficult to use (to say the least), and I have neither seen nor heard of such a dictionary file. This problem is somewhat mitigated by the fact that switching dictionaries does not affect the input field, so one can display the definitions in as many languages as one might want without going to the trouble of reentering the query item -- assuming, of course, the entry happens to exist in multiple languages. Fortunately, the lack of an entry in a given dictionary does not stomp the input field, so one need not fear accidentally loading the wrong dictionary.

Affixes Cause Lookups to Fail

The book integration is nice. Any single word can be clicked resulting in the dictionary coming up with a definition. However, the authors failed account for affixes. This means plural words or conjugated words result in no definition. Switching the virtual keyboard on or off (on the Zaurus, anyway) triggers a return to the program that initiated the lookup.

Limited to Supporting Programs

This feature, as implemented, is also specific to programs that support it and has not been generalized to the entire desktop (then again, touch screen devices really do not have hover capabilities, so this question may be moot -- once again, though, the OLPC does have a pointer device).

One Touch Lookup Not Available in Dictionary Itself

The one touch lookup is not available within definitions in ZBEDic itself, although highlighted lookup is, but a special lookup icon/button must be added to the menubar for this to work.

Many Dictionaries Too Small

ZBEDic's website has a lot of dictionaries available and many appear to be quite nice. Many of the bilingual dictionaries, however, contain very few entries. While normal dictionaries contain 30 to 40,000 words, some of these dictionaries have a mere 5,000 entries. These are clearly insufficient. Bilingual dictionaries with at least 30,000 entries must exist in the Public Domain for (at the very least) every combination of European languages and many others.

Problems with/Shortcomings of Both Programs

Desktop Integration

Neither Stardict's scan function nor ZBEDic's book integration work for all text on the desktop (unlike Dr. Eye), and neither allows hover lookups. Both StarDict's scan and ZBEDic's book integration could be improved (see above).

Compatability Lacking

Dictionaries compiled for one of the programs do not work with the other program. StarDict does not recognise the existence of a .dict.dz file outside of a folder and without the accompanying index files, and ZBEDic gives the error "Database error:entry too long" after attempting and failing to load StarDict's dictionary files. Both of these programs should have the capability to read each other's dictionaries or a mutually compatible format should be created (after all, this is Free Software, not MacroSuck's proprietary universe -- and standards are always nice ;-).

Now that I have more or less stated a lot about what I think the software should be capable of and gotten most of my opinions out in the open, I should talk about where I think this should be headed in the short term.

As I have stated before, I think it is possible to obtain Public Domain dictionaries in practically every major language spoken today and have multilingual capabilities between many of them by this summer.  I am hoping it will be possible to write Perl and/or Python scripts to easily convert text files to dict.dz files by that time (however, if it I am alone in this endeavor, it might take considerably longer as I am generally not the best programmer around).

In any case, I will do what I can, and I hope these comments lead to an improvement in this overall situation.  While the software is not bad, the difficulty in obtaining dictionaries is a problem.  For once, copyright is not the major hurdle, so I hope it will be possible to rapidly compile a large collection of good dictionaries for the good of children and curious people everywhere.

Sincerely,

LuYu

"How a society produces its information environment goes to the very core of freedom."


-- Yochai Benkler
_______________________________________________
Library mailing list
[email protected]
http://lists.laptop.org/listinfo/library

Reply via email to