RE: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath

Lee, David Sun, 06 Dec 2009 05:52:29 -0800

One more interesting tidbit
This expression did NOT use the indexes

/rxnsat//row[RXAUI eq $id2]

But this did
//row[RXAUI eq $id2]

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lee, David
Sent: Sunday, December 06, 2009 7:56 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses
tooptimize XPath

That query doesnt do what I want, because (shame on me) I have multiple
docs with //row elements.
But just for testing I ran it and it performs about the same as the
cts:search case.
(4.5 sec) so it seems to be using indexes in that case.
Even that seems to be too slow for me where the result set is 8 records
of about 100 bytes each.

My real question here is one I'm trying to discover.   And one that I
think many people are asking.
Can I get MarkLogic to perform like an RDBMS in the (hopefully rare)
cases where the data really is like RDB data ?
That is lots (millions) of small identical "rows" of data where I'd like
to 'simply' look up a row by an exact key match.  Not word or phrase or
wildcard searching of big docs in the haystack,
but a real RDBMS style single key lookup type index.

I tried creating a FIELD but that didn't seem to do much good.   
(The field search wasn't any faster then element-word searches).

What's interesting is I can search other document sets and return
hundreds of results in < 200ms
but this one is really thrashing ML.  I suspect due to the high
fragmentation.  (3 million fragments).
But what's the suggestion when the data really is flat like this ?   If
I dont fragment it,
it makes a 1G XML file .. which blows up ML.  There's no structure
in-between.

What I going to experiment with next is sticking this particular file in
an RDBMS and using the SQL connector code ... Yuck.  I was really
hopping not to do that.

Another idea, which I think is pretty ugly, but might help, is to
artificially create structure where none exists.   For example say group
the records by the first 2 digits of the key value into a document and
reduce the fragmentation by 100x    But even getting this restructuring
done is painful because the doc is too big to load into memory so I need
to use a DB just to get at it.
Which probably means I load it into an RDBMS to restructure the XML or
maybe just leave it there.

I'm sure others have had this kind of problem ? Any suggestions for
techniques for handling millions of "rows" of very small "records"  ?

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jason
Hunter
Sent: Sunday, December 06, 2009 12:31 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Interesting case where ML refuses
tooptimize XPath

I'm curious what speed you see for this:

//row[RXAUI = $id2]

I'm assuming row is your fragment root.

-jh-

On Dec 5, 2009, at 6:20 AM, Lee, David wrote:

> I have 2 xml docs, each about 1GB and about 2 mil fragments ("rows")
each ... in fact the elements are called "rows".
> Each "row" element is about 500 bytes.   But I dont yet have a better
way to fragment them.
> ( Yes Its been suggested to split these to seperate docs and I may
experiment with that. )
>  
> Here's a case where I've found ML refuses to optimize xpaths.
>  
> First off, this expression takes about 5 seconds, which I find a
little slow ...  it returns 8 rows.
>  
>  
> declare variable $id := '2483417';
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id]
> return $r
>  
>  
> Now to complicate things I actually need $id from a previous query so
the real query is like
>  
>  
> declare variable $id := '2483417';
> declare variable $c := doc("/RxNorm/rxnconso.xml")/rxnconso/row[RXAUI
eq $id];
> declare variable $id2 as xs:string := $c/RXAUI/string();
>  
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id2]
> return $r
>  
> This takes about 1 minute ! ..    Checking the profile I find the
expression  row[ RXAUI eq $id] is evaluated a million times ...
indicating its not doing indexing.
>  
> I've tried all sorts of combinations of these like
>  
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[xs:string(RXAUI) eq $id2]
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $c/RXAUI]
> doc("/RxNorm/rxnsat.xml")/rxnsat/row/RXAUI[. eq $id2]/ancestor::row
>  
>  
> All to the same avail ... no indexing !
>  
> But of course this brings things back to speed
>  
> ---------
> for $r in cts:search(doc("/RxNorm/rxnsat.xml")/rxnsat/row,
> cts:element-query( xs:QName("RXAUI") , $id2 ))
> return $r
>  
> ------------
>  
>  
> Still takes too long (about 5 sec) ... but its back to realtime
atleast.
>  
> I'm experimenting now with fields ...
>  
> But I find it strange that I cant the xpath expression to use the
indexes in one case but it does in another that seems almost identical
to me.
>  
> This expression
> declare variable $id2 as xs:string := $c/RXAUI/string();
>  
> should tell the system that $id2 is a single string so why wont it use
it in xpath based index queries ?
>  
>  
>  
>  
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]
> 812-482-5224
>  
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath

Reply via email to