To make concrete progress - I tried "step 1" - issolate XML semantics from 
search/index behavior.

I reproduced the original bug exactly (in 8.0.3).
Then cleared the DB and did the same thing but instead of cts:search used this 
pure XQuery statement to see if at the XML, XDM and XQuery  level if everything 
is right .
Looks so:
(only non-standard is xdmp:path() which is just to help clarify what is being 
tested )
Note: as per W3C specs the element, and attribute are correctly identified with 
the right language. (no  text content yet)

Next step - see if search works as documented - to tell if the problem is a 
docs bug, a 
code bug or a undefined-needs-clarification issue.


----------

for $d in doc() 
return <doc uri="{fn:document-uri($d)}"> { 
    for $e in 
       ($d/element() , $d//bar , $d//bar/@type , $d//bar/text()  )
          return 
            <lang-for path="{xdmp:path($e)}" 
lang="{$e/ancestor-or-self::*/@xml:lang}">{ 
              for $l in ("en","de") return
              <is-lang lang="{$l}">{ fn:lang( $l , $e ) } </is-lang>
            }</lang-for>
      }
      </doc>
-----------------------
Result:

element 
<doc uri="/test/foo1">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element 
<doc uri="/test/foo2">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element 
<doc uri="/test/foo3">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>
element 
<doc uri="/test/foo4">
<lang-for path="/foo" lang="">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">false</is-lang>
</lang-for>
<lang-for path="/foo/bar" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
<lang-for path="/foo/bar/@type" lang="de">
<is-lang lang="en">false</is-lang>
<is-lang lang="de">true</is-lang>
</lang-for>
</doc>

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell:  +1 812-630-7622
www.marklogic.com

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of David Lee
Sent: Thursday, June 25, 2015 9:24 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Indexing strategy for attributes when 
using xdmp:xlst-invoke


> The docs are pretty clear that the xml:lang  affects the language of 
> the *child text* of elements,

  The XML spec says <http://www.w3.org/TR/REC-xml/#sec-lang-tag>:

    The language specified by xml:lang applies to the element where it is
    specified (including the values of its attributes) -----------<><<

Yes, interesting.

However the test cases that were shown are using non-standard functions - i.e. 
the result of  indexes and other vendor specific features (not XPath or XDM or 
XQuery standards).
How ML indexes things and returns results using cts:search() and such is not 
covered by any specs except ML's.  
The same is true for all implementations of products that extend a spec.

This isn't an excuse - there is obviously inconsistent behavior shown by pre 
and post deindexing, but the test cases don't really uncover what that is 
exactly - beyond 'unexpected'.

I am not suggesting this is anyone else's responsibility - just making a 
personal observation on pre-judging exactly 'what' is broken when given a 
particular test when testing features - 
If the tests are not testing the documented behavior - its not nearly so easy 
(for anyone) to judge if the observations are 'correct' or not.   The tests in 
this thread are (to my read) - *neither* test the documented ML vendor specific 
features against ML docs, nor testing XQuery/XML core features against the W3C 
docs.   So its not easy for either users or developers to make an objective 
statement about if its 'right' or not -- and if not, what exactly isn't 
'right'.   
 
That causes debates like this to proliferate instead of getting work done :)


.NOTE: ..>> ( This is a general cross industry /  cross company/organization 
statement
   - and a personal opinion)  

I bring this up 'preemptively' to help prioritize something as a 'bug' or 
'defect' vs 'that would be nice to improve ... someday'  If the problem 
reported doesn't conflict with product specific feature docs or core W3C docs, 
especially if it doesn't appear to be a common use case - its more likely to be 
considered a 'feature enhancement request' then a 'bug fix' -  and that 
(feature vs bug) , whether the product open source and written by the love of 
volunteers, or proprietary and written by paid staff - has a huge impact  on if 
or when it will be considered.
If only we all had infinite clones and time and resources :)






_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to