RE: [MarkLogic Dev General] intersect? nodes

Paul M Thu, 21 Jan 2010 10:38:17 -0800

What if there is one small change...

let $s := <a><b v="1"/><b v="2"/><b v="3"/></a>
let $t := <a><b v="2"/><b v="3"/></a>

--- On Thu, 1/21/10, [email protected] 
<[email protected]> wrote:

From: [email protected] 
<[email protected]>
Subject: General Digest, Vol 67, Issue 22
To: [email protected]
Date: Thursday, January 21, 2010, 9:47 AM

Send General mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    http://xqzone.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."

Today's Topics:

   1. Re: question about cts:search (Helen Chen)
   2. Re: question about cts:search (Helen Chen)
   3. intersect? nodes (Paul M)
   4. RE: intersect? nodes (Danny Sokolsky)
   5. content transformations and RecordLoader    (via content_factory
      ?) (Lewon, Paul (GPMS))

----------------------------------------------------------------------

Message: 1
Date: Thu, 21 Jan 2010 10:49:34 -0500
From: Helen Chen <[email protected]>
Subject: Re: [MarkLogic Dev General] question about cts:search
To: General Mark Logic Developer Discussion
    <[email protected]>
Cc: Helen Chen <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Hi Doug,

I tried your query, I think when the xpath returns mode than one node,  
it will fail, but if I add for loop to it , it works fine.

I think the trouble for me now is: everyone thinks putting the  
function base-uri(), document-uri() or xdmp:node-uri()  at the end of  
xpath should work, but I cannot make it work.  I'm going to setup  
another test enviroment with marklogic4  and then see if there is any  
difference.

I'll keep everyone posted.

Thanks for everyone's help.

Helen

On Jan 20, 2010, at 5:37 PM, Glidden, Douglass A wrote:

> Helen,
>
> I'm not sure this will make a difference, but try these alternatives:
>
>  fn:base-uri(doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)])
>
>  fn:document-uri(doc()[ns1:article//ns1:sub[not(@temp1 or @temp2)]])
>
> I'm not sure (internally) how this usage differs from the other, so  
> it may or may not work.  In the document-uri case, I made a change  
> to the XPath to avoid having to use root function, but you can try  
> either way.
>
> Doug Glidden
> Software Engineer
> The Boeing Company
> [email protected]
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected] 
> ] On Behalf Of Helen Chen
> Sent: Wednesday, January 20, 2010 16:56
> To: General Mark Logic Developer Discussion
> Cc: Helen Chen
> Subject: Re: [MarkLogic Dev General] question about cts:search
>
> Hi Geert,
>
> Thanks. Now I understand why sometimes when I use document-uri() and  
> it returns empty sequence, that's because it is not document element.
> So I did the following test:
>
> -------------
> for $doc in doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)] let  
> $uri := try { base-uri($doc) } catch ($e) { () } where  not($uri)  
> return $doc
>
> it returns empty sequence
>
> -----------------------
> for $i in doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)] return
> base-uri($i)
>
> it returns 7 uris and they are actually the same uri  (/pt/ 
> ajr_1.xml), just because there are 7 ns1:sub nodes found.
>
> ------------------
>
> when I tried to run
> doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/fn:base-uri(.)
>
> I got error  :
> Description: XDMP-NOTANODE: doc()/child::ns1:article/
> descendant::ns1:sub[attribute::temp1 or attribute::temp2]/base-uri(.)
> -- xs:anyURI("/pt/ajr_1.xml") is not a node
>
>
> the base-rui() function returns type anyURI, not a node, is this  
> what it complains?
>
> Thanks, Helen
>
>
> On Jan 20, 2010, at 4:19 PM, Geert Josten wrote:
>
>> Hi Helen,
>>
>> I personally prefer base-uri, as it navigates up to the root itself  
>> if
>> necessary. You can read more about the two functions here:
>>
>> http://developer.marklogic.com/pubs/4.1/apidocs/ 
>> AccessorBuiltins.html#
>> fn :document-uri (particularly read Summary and User Notes)
>>
>> Kind regards,
>> Geert
>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Helen
>>> Chen
>>> Sent: woensdag 20 januari 2010 22:05
>>> To: General Mark Logic Developer Discussion
>>> Cc: Helen Chen
>>> Subject: Re: [MarkLogic Dev General] question about cts:search
>>>
>>> Hi Geert,
>>>
>>> I run your query
>>> for $doc in doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)] let
>>> $uri := try { base-uri($doc) } catch ($e) { () } where
>>> not($uri) return
>>>   $doc
>>>
>>> and it returns empty sequence.
>>> ------------
>>>
>>> if I take "not" out in the where clause, like for $doc in
>>> doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)] let $uri := try {
>>> base-uri($doc) } catch ($e) { () } where ($uri) return
>>>   $doc
>>>
>>> I got 7 nodes out, these nodes only have attributes, no sub nodes
>>> inside, like <ns1:sub file="1"/>,  and they are all in the same xml
>>> file, the xml is:  /pt/ajr_1.xml
>>>
>>> --------------
>>> I actually can run
>>>
>>> doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/fn:root(.)
>>>
>>> and it returns the root of file /pt/ajr_1.xml, and only one result
>>> node
>>>
>>> ---------
>>> but I cannot run
>>> doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/fn:document-
>>> uri(fn:root(.))
>>>
>>>
>>> I thought it was because of multiple node returned, but when I use
>>> fn:root(.) at the end, it returns only one node, so it must be
>>> something else.
>>>
>>> when I do :     doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/
>>> fn:document-uri(.)
>>>                      it returns empty sequence
>>>
>>> when I do:      doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/
>>> fn:document-uri(fn:root(.))
>>>                      it gives error
>>>
>>> when I do:      doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]/
>>> fn:document-uri(fn:root(.)/ns1:article)
>>>                       this query makes sure only one node for
>>> fn:document-uri() function, but it returns empty sequence
>>>
>>> It seems that fn:root() pass root element to the
>>> fn:document-uri() and
>>> it does not like it, if I pass node to it, it returns empty  
>>> sequence.
>>> Can we use fn:document-uri() or fn:base-uri() at the end of query?
>>>
>>>
>>> Thanks, Helen
>>>
>>>
>>> On Jan 20, 2010, at 3:33 PM, Geert Josten wrote:
>>>
>>>> Hi Helen,
>>>>
>>>>> I changed and I got the same error as when I use fn:base-uri(.)
>>>>>
>>>>> Description: XDMP-NOTANODE: doc()/child::ns1:article/
>>>>> descendant::ns1:sub[not(attribute::temp1 or
>>>>> attribute::temp2)]/document-uri(root(.)) --
>>>>> xs:anyURI("/pt/ajr_1.xml") is not a node
>>>>>
>>>>> The result is really only one document, but it seems that
>>> I cannot
>>>>> put
>>>>> fn:document-uri at the end of the query.
>>>>
>>>> This is actually quite strange. Doc() always returns nodes
>>> or empty
>>>> sequence, child always returns nodes or empty sequence, descendant
>>>> always returns nodes or empty sequence, so this is very
>>> odd. Even a
>>>> text or binary document would result in a node.
>>>>
>>>> Could you try the following to figure out which document is
>>> causing
>>>> the trouble?
>>>>
>>>> for $doc in doc()/ns1:article//ns1:sub[not(@temp1 or @temp2)]
>>>> let $uri := try { base-uri($doc) } catch ($e) { () }
>>>> where not($uri)
>>>> return
>>>>  $doc
>>>>
>>>> Kind regards,
>>>> Geert
>>>>
>>>>
>>>> Drs. G.P.H. Josten
>>>> Consultant
>>>>
>>>>
>>>> http://www.daidalos.nl/
>>>> Daidalos BV
>>>> Source of Innovation
>>>> Hoekeindsehof 1-4
>>>> 2665 JZ Bleiswijk
>>>> Tel.: +31 (0) 10 850 1200
>>>> Fax: +31 (0) 10 850 1199
>>>> http://www.daidalos.nl/
>>>> KvK 27164984
>>>> De informatie - verzonden in of met dit emailbericht - is
>>> afkomstig
>>>> van Daidalos BV en is uitsluitend bestemd voor de geadresseerde.
>>>> Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het
>>>> te verwijderen. Aan dit bericht kunnen geen rechten worden  
>>>> ontleend.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

------------------------------

Message: 2
Date: Thu, 21 Jan 2010 11:11:06 -0500
From: Helen Chen <[email protected]>
Subject: Re: [MarkLogic Dev General] question about cts:search
To: [email protected]
Cc: General Mark Logic Developer Discussion
    <[email protected]>,    Helen Chen <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Hi Florent,

based on document 
http://developer.marklogic.com/pubs/4.1/apidocs/AccessorBuiltins.html#fn 
:base-uri

base-uri() returns xs:anyURI.

and in documetn http://www.w3.org/TR/xpath-functions/#func-base-uri

it also returns xs:anyURI.

Maybe I missed something in this problem,  I checked the query with  
"/." at the end of the query, it returns node type,   I feel that I  
just cannot put this kind of function like fn:base-uri() at the end of  
xpath using "/" operator.  And I don't know where I get wrong. I'm  
testing more and I'll keep everyone posted.

Thanks,
Helen

On Jan 20, 2010, at 7:23 PM, Florent Georges wrote:

> Helen Chen wrote:
>
>  Hi,
>
>> Description: XDMP-NOTANODE: doc()/child::ns1:article/
>> descendant::ns1:sub[attribute::temp1 or attribute::temp2]/base-uri(.)
>> -- xs:anyURI("/pt/ajr_1.xml") is not a node
>
>  It looks like if a subsequent usage of the result of this
> expression was used in a context where a node is expected (either
> because you try to access the result as a node in the calling
> environment, or is you use it later in the query, for instance by
> trying to apply the '/' operator).
>
>  But base-uri() must return a xs:string, not an xs:anyURI...
>
>  Regards,
>
> -- 
> Florent Georges
> http://www.fgeorges.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

------------------------------

Message: 3
Date: Thu, 21 Jan 2010 09:23:48 -0800 (PST)
From: Paul M <[email protected]>
Subject: [MarkLogic Dev General] intersect? nodes
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

let $s := <a><b>1</b><b>2</b><b>3</b></a>
let $t := <a><b>2</b><b>3</b></a>

I want <b>1</b> node returned...???
I want to know that $t is missing <b>1</b>. $s is always larger set. $s should 
always include all of $t. 

for and let with where is only thing I could see. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20100121/bcf23af3/attachment-0001.html

------------------------------

Message: 4
Date: Thu, 21 Jan 2010 09:35:48 -0800
From: Danny Sokolsky <[email protected]>
Subject: RE: [MarkLogic Dev General] intersect? nodes
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID:
    <[email protected]>
Content-Type: text/plain; charset="us-ascii"

How about something like:

xquery version "1.0-ml";
let $s := <a><b>1</b><b>2</b><b>3</b></a>
let $t := <a><b>2</b><b>3</b></a>
for $b at $i in $s/b
return
(<res>
   <i>{$i}</i>
   <matches>{$b eq $t/b}</matches>
 </res>)

Which returns:

<res><i>1</i><matches>false</matches></res>
<res><i>2</i><matches>true</matches></res>
<res><i>3</i><matches>true</matches></res>

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Paul M
Sent: Thursday, January 21, 2010 9:24 AM
To: [email protected]
Subject: [MarkLogic Dev General] intersect? nodes

let $s := <a><b>1</b><b>2</b><b>3</b></a>
let $t := <a><b>2</b><b>3</b></a>

I want <b>1</b> node returned...???
I want to know that $t is missing <b>1</b>. $s is always larger set. $s should 
always include all of $t.

for and let with where is only thing I could see.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20100121/68ab08da/attachment-0001.html

------------------------------

Message: 5
Date: Thu, 21 Jan 2010 12:47:37 -0500
From: "Lewon, Paul (GPMS)" <[email protected]>
Subject: [MarkLogic Dev General] content transformations and
    RecordLoader    (via content_factory ?)
To: "'[email protected]'"
    <[email protected]>
Message-ID:
    <[email protected]>
Content-Type: text/plain; charset="us-ascii"

Hi all,

Here's my situation:

*       I have a mixture of xml, sgml, and simple mark-up, and can't rely on 
the file extensions to tell me what's what.
*       I have a working RecordLoader install, a MarkLogic database, and a file 
staging area where folks are collecting the content.
*       The process of collecting the files, and creating new files is ongoing; 
I have no idea how many files we'll have, but it will almost certainly be 
several hundred thousand.
*       Many files have sgml doctype declarations, but not all. Some few have 
xml doctype declarations.
*       Most are not-UTF8. But some declare themselves as UTF8 in the doctype 
declaration even though they aren't.

What I'd like to do:

*       I need to ingest it all into the MarkLogic database.
*       I want to clear the database and repeat the ingest on a regular (TBD) 
basis.
*       I do not want to pre-process the content prior to RecordLoader if at 
all possible.
*       I want to handle according to the following logic:
*       Handle incoming content as non-UTF8.
*       If incoming files are binary, do not load.
*       If the incoming file has a doctype definition, xml or sgml, handle it 
by converting to xml, removing problematic processing instructions, and 
pre-empting MarkLogic from turning SGML singletons into nested XML nodes (via 
default stack-level repair) by instead turning them into properly tagged XML 
singletons.
        (That is, I want <date><year year="2006"><month month="1"><day 
day="1"></date> to become <date><year year="2006" /><month month="1" /><day 
day="1" /></date> and not <date><year year="2006"><month month="1"><day 
day="1"></day></month></year></date>)
*       Unless incoming content declares its namespace, load all content to the 
empty namespace.
*       Else if the incoming file has a top level node, treat as xml.
*       Else, ingest as text.

And finally, the question:

Can I use RecordLoader to do this, and without pre-processing? I'm having a 
hard time wrapping my head around the processing paradigm of CONTENT_FACTORY 
via RecordLoader. Is xquery-based content handling via CONTENT_FACTORY going to 
fire after the MarkLogic Server has already handled incoming SGML singletons? 
And if this is possible, does it put too much burden on RecordLoader (i.e. is 
it scalable and repeatable on a regular basis)?

Thank you,
Paul

Paul Lewon
Production Technology, Global Production & Manufacturing Services
Cengage Learning
27500 Drake Rd. Farmington Hills, MI  48331

*: [email protected] | www.cengage.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20100121/ddfbdabd/attachment.html

------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

End of General Digest, Vol 67, Issue 22
***************************************

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] intersect? nodes

Reply via email to