To make concrete progress - I tried "step 1" - issolate XML semantics from
search/index behavior.
I reproduced the original bug exactly (in 8.0.3).
Then cleared the DB and did the same thing but instead of cts:search used this
pure XQuery statement to see if at the XML, XDM and XQuery level if everything
is right .
Looks so:
(only non-standard is xdmp:path() which is just to help clarify what is being
tested )
Note: as per W3C specs the element, and attribute are correctly identified with
the right language. (no text content yet)
Next step - see if search works as documented - to tell if the problem is a
docs bug, a
code bug or a undefined-needs-clarification issue.
----------
for $d in doc()
return <doc uri="{fn:document-uri($d)}"> {
for $e in
($d/element() , $d//bar , $d//bar/@type , $d//bar/text() )
return
<lang-for path="{xdmp:path($e)}"
lang="{$e/ancestor-or-self::*/@xml:lang}">{
for $l in ("en","de") return
<is-lang lang="{$l}">{ fn:lang( $l , $e ) } </is-lang>
}</lang-for>
}
</doc>
-----------------------
Result:
element
<doc uri="/test/foo1">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element
<doc uri="/test/foo2">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element
<doc uri="/test/foo3">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element
<doc uri="/test/foo4">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell: +1 812-630-7622
www.marklogic.com
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Lee
Sent: Thursday, June 25, 2015 9:24 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Indexing strategy for attributes when
using xdmp:xlst-invoke
> The docs are pretty clear that the xml:lang affects the language of
> the *child text* of elements,
The XML spec says <http://www.w3.org/TR/REC-xml/#sec-lang-tag>:
The language specified by xml:lang applies to the element where it is
specified (including the values of its attributes) -----------<><<
Yes, interesting.
However the test cases that were shown are using non-standard functions - i.e.
the result of indexes and other vendor specific features (not XPath or XDM or
XQuery standards).
How ML indexes things and returns results using cts:search() and such is not
covered by any specs except ML's.
The same is true for all implementations of products that extend a spec.
This isn't an excuse - there is obviously inconsistent behavior shown by pre
and post deindexing, but the test cases don't really uncover what that is
exactly - beyond 'unexpected'.
I am not suggesting this is anyone else's responsibility - just making a
personal observation on pre-judging exactly 'what' is broken when given a
particular test when testing features -
If the tests are not testing the documented behavior - its not nearly so easy
(for anyone) to judge if the observations are 'correct' or not. The tests in
this thread are (to my read) - *neither* test the documented ML vendor specific
features against ML docs, nor testing XQuery/XML core features against the W3C
docs. So its not easy for either users or developers to make an objective
statement about if its 'right' or not -- and if not, what exactly isn't
'right'.
That causes debates like this to proliferate instead of getting work done :)
.NOTE: ..>> ( This is a general cross industry / cross company/organization
statement
- and a personal opinion)
I bring this up 'preemptively' to help prioritize something as a 'bug' or
'defect' vs 'that would be nice to improve ... someday' If the problem
reported doesn't conflict with product specific feature docs or core W3C docs,
especially if it doesn't appear to be a common use case - its more likely to be
considered a 'feature enhancement request' then a 'bug fix' - and that
(feature vs bug) , whether the product open source and written by the love of
volunteers, or proprietary and written by paid staff - has a huge impact on if
or when it will be considered.
If only we all had infinite clones and time and resources :)
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general