Good morning,
Mike, thanks for the free consulting :)
I agree, in the case of English and French, I don't think I need to be
concerned with the tokenizing.
I may end up submitting an RFE on this (something similar to the
thesaurus expansion API sounds like a good approach to me) since
future projects and other customers may benefit, and in the meantime I
will be fine with the help you and Danny provided (speaking of which,
as always, a thank-you to you and everyone at Mark Logic for the
support in getting the most out of your product).
Best,
Shannon
On Oct 9, 2008, at 4:46 PM, Michael Blakeley wrote:
Shannon,
Hmm... I think we may be talking at cross-purposes. As I mentioned
yesterday, I'm a little concerned about maintaining a distinction
between cts:query term-level language, vs the language passed to
cts:tokenize() in lp:get-cts-query-element().
When I mentioned the idea of adding another parameter to lp:get-cts-
query(), I was thinking of the cts:tokenize() option. But I think I
jumped to a conclusion there. French and English aren't that
different, and I can't think of a place where the cts:tokenize()
language would matter (as opposed to, for example, Chinese).
Based on this latest email, and the exchange with Danny, you'd like
to pass multiple languages to lp:get-cts-query(), and get back an
internally-expanded or-query for every language for each input term.
This would work somewhat like thesaurus expansion. Is that correct?
If so, this does seem like a useful RFE (for lib-parser, or for
MarkLogic Server). But you can also do this in your own code fairly
easily:
xquery version "0.9-ml"
define function expand-languages($query as cts:query, $lang as
xs:string*)
as cts:query
{
if (empty($lang)) then $query else
typeswitch($query)
case cts:and-query return cts:and-query(
for $q in cts:and-query-queries($query)
return expand-languages($q, $lang),
cts:and-query-options($query)
)
case cts:word-query return cts:or-query((
let $opts :=
cts:word-query-options($query)[not(starts-with(., 'lang='))]
let $word := cts:word-query-text($query)
for $i in $lang
return cts:word-query($word, ($opts, concat('lang=', $i)))
))
default return error(
'UNIMPLEMENTED', text { 'no support for', xdmp:describe($query) } )
}
expand-languages(
cts:and-query((
cts:word-query('foo'),
cts:word-query('bar')
)), ('en', 'fr') )
=>
cts:and-query((cts:or-query((cts:word-query("foo", ("lang=en"), 1),
cts:word-query("foo", ("lang=fr"), 1))), cts:or-query((cts:word-
query("bar", ("lang=en"), 1), cts:word-query("bar", ("lang=fr"),
1)))), ())
Keep expanding the typeswitch to cover all the possibilities.
-- Mike
Shannon wrote:
Thank you, Mike--that's so very agreeable--yes, per-query control
language awareness would be most useful! Given a form that accepts
a query string input and a language selector that includes an
"all" option, the desired behavior is language-specific
tokenization, in this case, for English and French; Danny
demystified the search recall logic, but lib-parser doesn't
provide the full support, yet, to get the most out of the French
language module--currently I'm using the overloaded lp:get-cts-
query() that grabs $options at the 3rd argument; maybe another
overload with a 4th argument, or perhaps take the hint from the
lang option if supplied?
Thanks,
Shannon
On Oct 8, 2008, at 5:29 PM, Michael Blakeley wrote:
Today, lib-parser calls cts:tokenize() without the language
argument, so it always uses the database default language. So the
tokenization is language-aware, but there's no per-query control
over which language it uses.
If per-query control over language awareness would be useful, how
would you like to express it? As another (optional) argument to
lp:get-cts-query()?
I'm a little concerned about maintaining a distinction between
cts:query term-level language, vs the language passed to
cts:tokenize() in lp:get-cts-query-element(). But if it's useful
functionality, let's figure out how to add it.
-- Mike
Shannon wrote:
Hi,
Does anyone know whether lib-parser has support for language-
aware tokenization, for lp:get-cts-query specifically?
Thanks,
__________________________________________________
Shannon Scott Shiflett, programmer/analyst with ROTUNDA,
The University of Virginia Press, Charlottesville, VA USA
http://rotunda.upress.virginia.edu
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
__________________________________________________
Shannon Scott Shiflett, programmer/analyst with ROTUNDA,
The University of Virginia Press, Charlottesville, VA USA
http://rotunda.upress.virginia.edu
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
__________________________________________________
Shannon Scott Shiflett, programmer/analyst with ROTUNDA,
The University of Virginia Press, Charlottesville, VA USA
http://rotunda.upress.virginia.edu
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general