Well, just to give an update.  I'm stuck in Rome.

So, I'm sitting around and have time to comment... :)

There are a couple issue here that most of us know about but I'll enumerate a few:

The lexdict driver in the engine is designed as a quick key lookup datastore. There is some processing done to try to do best matches.

all numbered keys are padded, so

123 gets changed to 00123

This helps match strongs numbers.

Other keys get touppered to help match case insensitive. We can maybe do other things like strip accents and diacritics, but we should probably add something to the .conf file to allow different key massaging. Just letting the module create massage the keys beforehand isn't a great solution because the same massaging needs to happen on the user input when they try to lookup a key.

Massaging keys is all beneficial for looking up the best match, in most cases, and presenting surrounding entries sometimes helps the user choose, provided they didn't like the resolution. Let's not think of the surrounding keys as the 'order of the lexicon'.

Presenting a lexicon as an ordered book is a different function. We should not attempt to use the lexdict driver to support this.

If there is ever a time when you would want to present a lexicon as an ordered book to an end user, then we should consider possibly having a genbook index on the same module for that purpose.

I may be going out on a limb here, but most of the time, I don't think a user would want to see a dictionary presented as a book. I have come across one exception, and that is our Hesychius module, but it is really meant to be an ancient work studied as such. I think we could create a lexdict module from the Hesychius data and it would be cool to do lookups and present the data from it, like we do with other SWORD lexdicts, but the primary purpose for the module is to make the ancient work available for scholars in its original form-- the original ancient work just happens to be a synonym lexicon.


Searching is a similar issue. StripFilters are used to massage the text to put it into searchable form. We've had the theory that user input is sent through the module's same StripFilter set. We don't enforce this in the SWModule::search method because there may be times this isn't desirable, so it has been up to the frontend, if they think it is useful, to call: module.StripText(searchTerm); before calling module.search(searchTerm)

Maybe we should enforce some massaging logic in the engine instead of leaving it to the frontend to make the choice, but I'd rather leave the freedom to the consumer of the API to make the decision.

Just some comments...

        -Troy.




Eeli Kaikkonen wrote:
DM Smith wrote:


The problem is a bit deeper than that.


Yes, and there are some other things I want to bring up again lest they be forgotten.

1) The case may convey information, e.g. Liddel&Scott uses capitals for root words. 2) L&S uses different ordering for iota subscriptum/accents/spiritus than BAGD, at least as far as I can remember. 3) Exact ordering may convey information, e.g. L&S adds word "hence" at the end of some entries because the next entry depends on the previous.

The information of 1) and 3) can be represented in some other way, but even if it's taken care of, the subjective quality suffers if lexicons don't follow the originals.

--Eeli Kaikkonen

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to