One more interesting tidbit This expression did NOT use the indexes /rxnsat//row[RXAUI eq $id2]
But this did //row[RXAUI eq $id2] -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Lee, David Sent: Sunday, December 06, 2009 7:56 AM To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath That query doesnt do what I want, because (shame on me) I have multiple docs with //row elements. But just for testing I ran it and it performs about the same as the cts:search case. (4.5 sec) so it seems to be using indexes in that case. Even that seems to be too slow for me where the result set is 8 records of about 100 bytes each. My real question here is one I'm trying to discover. And one that I think many people are asking. Can I get MarkLogic to perform like an RDBMS in the (hopefully rare) cases where the data really is like RDB data ? That is lots (millions) of small identical "rows" of data where I'd like to 'simply' look up a row by an exact key match. Not word or phrase or wildcard searching of big docs in the haystack, but a real RDBMS style single key lookup type index. I tried creating a FIELD but that didn't seem to do much good. (The field search wasn't any faster then element-word searches). What's interesting is I can search other document sets and return hundreds of results in < 200ms but this one is really thrashing ML. I suspect due to the high fragmentation. (3 million fragments). But what's the suggestion when the data really is flat like this ? If I dont fragment it, it makes a 1G XML file .. which blows up ML. There's no structure in-between. What I going to experiment with next is sticking this particular file in an RDBMS and using the SQL connector code ... Yuck. I was really hopping not to do that. Another idea, which I think is pretty ugly, but might help, is to artificially create structure where none exists. For example say group the records by the first 2 digits of the key value into a document and reduce the fragmentation by 100x But even getting this restructuring done is painful because the doc is too big to load into memory so I need to use a DB just to get at it. Which probably means I load it into an RDBMS to restructure the XML or maybe just leave it there. I'm sure others have had this kind of problem ? Any suggestions for techniques for handling millions of "rows" of very small "records" ? -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jason Hunter Sent: Sunday, December 06, 2009 12:31 AM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath I'm curious what speed you see for this: //row[RXAUI = $id2] I'm assuming row is your fragment root. -jh- On Dec 5, 2009, at 6:20 AM, Lee, David wrote: > I have 2 xml docs, each about 1GB and about 2 mil fragments ("rows") each ... in fact the elements are called "rows". > Each "row" element is about 500 bytes. But I dont yet have a better way to fragment them. > ( Yes Its been suggested to split these to seperate docs and I may experiment with that. ) > > Here's a case where I've found ML refuses to optimize xpaths. > > First off, this expression takes about 5 seconds, which I find a little slow ... it returns 8 rows. > > > declare variable $id := '2483417'; > for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id] > return $r > > > Now to complicate things I actually need $id from a previous query so the real query is like > > > declare variable $id := '2483417'; > declare variable $c := doc("/RxNorm/rxnconso.xml")/rxnconso/row[RXAUI eq $id]; > declare variable $id2 as xs:string := $c/RXAUI/string(); > > for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id2] > return $r > > This takes about 1 minute ! .. Checking the profile I find the expression row[ RXAUI eq $id] is evaluated a million times ... indicating its not doing indexing. > > I've tried all sorts of combinations of these like > > doc("/RxNorm/rxnsat.xml")/rxnsat/row[xs:string(RXAUI) eq $id2] > doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $c/RXAUI] > doc("/RxNorm/rxnsat.xml")/rxnsat/row/RXAUI[. eq $id2]/ancestor::row > > > All to the same avail ... no indexing ! > > But of course this brings things back to speed > > --------- > for $r in cts:search(doc("/RxNorm/rxnsat.xml")/rxnsat/row, > cts:element-query( xs:QName("RXAUI") , $id2 )) > return $r > > ------------ > > > Still takes too long (about 5 sec) ... but its back to realtime atleast. > > I'm experimenting now with fields ... > > But I find it strange that I cant the xpath expression to use the indexes in one case but it does in another that seems almost identical to me. > > This expression > declare variable $id2 as xs:string := $c/RXAUI/string(); > > should tell the system that $id2 is a single string so why wont it use it in xpath based index queries ? > > > > > ---------------------------------------- > David A. Lee > Senior Principal Software Engineer > Epocrates, Inc. > [email protected] > 812-482-5224 > > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
