Thanks Kevin! We'll give that a try. Though in our case the time-filter will most likely be the most significant in cutting down the search space as the majority of the messages have the same app_id.
One thought, for a query that needs to work with large datasets, is it supposed to be able to fail because of running out of memory, as opposed to using disk and running slower? Are there certain things it must hold in memory? Yours truly, Alexander 2015-04-22 18:05 GMT+02:00 Kevin Ford <[email protected]>: > Hi Alexander, > > In addition to what Charles said, try doing this instead: > > # FILTER( ?_app_id = 'APP_ID') > > ?HD_1 t5_m:hD.1Value 'APP_ID’ . > > > I think that is probably the most important enhancement you could make, > to the query at least. > > > To better see what you were doing, I re-ordered your query statements. > Note, however, that I’ve not found the order of statements to be terribly > important with MarkLogic’s triple store (as I have experienced, however, > with other triple stores) but I’ve copied and pasted the re-ordered query > below. It may or may not be helpful. > > > Yours, > > Kevin > > > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > PREFIX t5_m: <http://url.com/T5/model#> > > PREFIX t5_d: <http://url.com/T5/data#> > > > SELECT DISTINCT ?_app_id ( ?_err as ?_reason ) ?_comment ?_severity > > WHERE > > { > > # FILTER( ?_app_id = 'APP_ID') > > ?HD_1 t5_m:hD.1Value 'APP_ID' . > > > ?MSH_5 t5_m:hasHD.1 ?HD_1 . > > ?MSH t5_m:hasMSH.5 ?MSH_5 . > > > ?ACK t5_m:hasMSH ?MSH . > > ?ACK t5_m:hasMSA ?MSA . > > > ?MSA t5_m:hasMSA.2 ?MSA_2 . > > ?MSA_2 t5_m:mSA.2Value ?_msg_id . > > > ?PCD_01_Message t5_m:id ?_msg_id . > > ?PCD_01_Message a t5_m:PCD_01_Message . > > ?PCD_01_Message t5_m:timeStamp ?_ts . > > > FILTER ( ?_ts >= '2015-04-21T09:04:07.871' ) > > FILTER ( ?_ts <= '2015-04-21T09:07:43.973' ) > > > ?ACK t5_m:hasERR ?ERR . > > ?ERR t5_m:hasERR.7 ?ERR_7 . > > ?ERR_7 t5_m:eRR.7Value ?_err . > > > BIND ( 3 as ?_severity) > > BIND ( "Generated by HL7 v2 Conformance Profile of IHE PCD-01 message" as > ?_comment ) > > } > > > > From: Charles Greer <[email protected]> > Reply-To: MarkLogic Developer Discussion <[email protected]> > Date: Wednesday, April 22, 2015 at 10:47 AM > To: MarkLogic Developer Discussion <[email protected]> > Subject: Re: [MarkLogic Dev General] SPARQL queries causing > XDMP-MEMCANCELED > > Hi Alexander, > > The error you're getting is indicative of a timeout -- a long-running query > or a runaway memory usage. > > Follow up questions and suggestions - > 1. How are you running this query? via SPARQL endpoint or a built-in? > 2. Are there any default inference setting on the database? (from your > query I'd guess no, but that would affect the server a lot) > > If you're using a built-in to run the query, I'd try changing the > "optimize" option, which may > lead to a better query plan. > > I'd also suggest using 8.0-2 as there have been rapid improvements in > speed a query processing > throughout the dev cycles. > > If neither suggestion gets you anywhere, this is probably a query and > dataset we'd want to take a closer look at... the query > doesn't look like one I'd expect to have issues. > > Charles > > ------------------------------ > *From:* [email protected] [ > [email protected]] on behalf of Takacs [ > [email protected]] > *Sent:* Wednesday, April 22, 2015 8:30 AM > *To:* [email protected] > *Subject:* [MarkLogic Dev General] SPARQL queries causing XDMP-MEMCANCELED > > Hi, > > We are performing a series of different SPARQL queries on a database > containing ~5 million triples. > > Our queries often cause a XDMP-MEMCANCELED-error though not > consistently, they mostly return a correct result within a few seconds or > less. Some queries occationally seems to hang and cause the server to run > at 100% CPU until the query times out. > > We have tried increasing what memory-related settings we could find. > This query runs fine on other triple stores/engines. > > We are running MarkLogic 8.0-11 AWS instance with 8 gb of internal > memory and 16 gb of swap space. > > Example of a query that sometimes causes the error: > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > PREFIX t5_m: <http://url.com/T5/model#> > PREFIX t5_d: <http://url.com/T5/data#> > SELECT DISTINCT > ?_app_id > ( ?_err as ?_reason ) > ?_comment > ?_severity > WHERE > { > BIND ( 3 as ?_severity) > BIND ( "Generated by HL7 v2 Conformance Profile of IHE PCD-01 message" as > ?_comment ) > > FILTER( ?_app_id = 'APP_ID') > FILTER ( ?_ts >= '2015-04-21T09:04:07.871' ) > FILTER ( ?_ts <= '2015-04-21T09:07:43.973' ) > > ?ACK t5_m:hasMSH ?MSH . > ?MSH t5_m:hasMSH.5 ?MSH_5 . > ?MSH_5 t5_m:hasHD.1 ?HD_1 . > ?HD_1 t5_m:hD.1Value ?_app_id . > > ?ACK t5_m:hasMSA ?MSA . > ?MSA t5_m:hasMSA.2 ?MSA_2 . > ?MSA_2 t5_m:mSA.2Value ?_msg_id . > > ?PCD_01_Message a t5_m:PCD_01_Message . > ?PCD_01_Message t5_m:id ?_msg_id . > ?PCD_01_Message t5_m:timeStamp ?_ts . > > ?ACK t5_m:hasERR ?ERR . > ?ERR t5_m:hasERR.7 ?ERR_7 . > ?ERR_7 t5_m:eRR.7Value ?_err . > } > > Is there something in the configuration that we are missing or does the > query itself have issues? > > Thanks a lot for any suggestions! > > Yours truly, Alexander > > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
