Hi Guys, I have let the dbpintegrator run over the weekend, but it got stuck again (same behavior as mentioned before: Virtuoso stuck at 100% CPU use). The good news is we think we found the problem.
We found something strange in the update file with triples to delete where things got stuck. The file uses variables, but some of the variables are used more than once. For instance, the file 000981.removed.nt uses the ?o0 and ?o1 variables twice: <http://dbpedia.org/resource/Gaziantep> <http://www.w3.org/2000/01/rdf-schema#comment> ?o0 . <http://dbpedia.org/resource/Public_domain> <http://www.w3.org/2000/01/rdf-schema#comment> ?o0 . <http://dbpedia.org/resource/Gaziantep> <http://dbpedia.org/ontology/abstract> ?o1 . <http://dbpedia.org/resource/Public_domain> <http://dbpedia.org/ontology/abstract> ?o1 . This is probably the reason why the DELETE query for this file has a higher complexity and tripped Virtuoso up. We changed the code of dbpintegrator slightly to perform a Sparql query per line of this file instead of one query for the entire file. The tool is running without any problems so far. So perhaps something is going wrong in creating the files with the deletion triples? Best regards, Karel On Fri, Sep 16, 2011 at 5:53 PM, karel braeckman <[email protected]> wrote: > Hi Mohamed, > > You were right, I checked my settings and the tool does start at the > correct date. I must have done something wrong earlier. > > Best regards, > Karel > > On Fri, Sep 16, 2011 at 5:26 PM, Mohamed Morsey > <[email protected]> wrote: >> Hi Karel, >> >> On 09/16/2011 04:51 PM, karel braeckman wrote: >>> >>> Hi guys, >>> >>> First of all, thanks for all the suggestions. I changed my settings >>> according to your suggestions, and at first the problem was the same >>> (100% CPU for quite a while when deleting triples) but after a while >>> (~10 minutes) Virtuoso completed the action and now the tool seems to >>> be running ok. I'm afraid I don't know which of the settings did >>> finally got it to work. >> >> Nice to hear that :). >> >>> I have one more problem with the dbpintegrator tool however. I set the >>> date in my lastDownloadDate.dat to 2011-09-10-00-000000, but the tool >>> seems to start at the first file of the current hour >>> (2011-09-16-16-000001), could this be a bug? >> >> I've tried the dbpintegrator tool on my machine starting from the point you >> mention and it seems to work properly, so please recheck your settings and >> let me know if the problem still exists >> >>> @Mohamed: >>> It really did take two days to fill the store with the DBpedia live >>> dump. Initially it was fast, but it got slower and slower. There >>> already was the default DBpedia dump (not the live version) inserted >>> into another graph, maybe the amount of triples is just too large? How >>> fast should it (more or less) take to load the live dump into Virtuoso >>> you think? >> >> Not 100% sure but it should take something like 3-4 hours. >> >>> Virtuoso was running for a few weeks the first time I tried to run the >>> sync tool. Since then, I restarted it a few times after changing >>> config files and trying to debug things. >> >> Exactly, this what I meant, a restart could be helpful. >> >>> @Patrick, @Kingsley: >>> The version I am using is Version: 06.01.3127, Build: Mar 16 2011 of >>> VOS. I downloaded and compiled it (on Ubuntu 10.04.2 LTS). >>> >>> The lastDownloadDate.dat file contains 2011-09-16-16-000555 at the >>> moment of writing (the tool is working now). >>> >>> Best regards and thanks for the hints, >>> Karel >>> >>> On Fri, Sep 16, 2011 at 3:53 PM, Patrick van Kleef >>> <[email protected]> wrote: >>>> >>>> Hi Karel, >>>> >>>>> Forgot to mention the machine details: >>>>> >>>>> 24GB RAM >>>>> 2x quadcore Xeon E5540 2.5GHz >>>>> Virtuoso data is on SSD disks >>>> >>>> Your parameters look ok, but you may want to try adding the following: >>>> >>>> [Parameters] >>>> ... >>>> DefaultIsolation = 2 >>>> ... >>>> >>>> which sets a different transaction isolation level which is more suitable >>>> for situation where updates/deletes and queries are done on the same >>>> server. >>>> >>>> >>>> Did you also set your linux kernel swappiness parameter as per the >>>> following >>>> Tips and Tricks article: >>>> >>>> >>>> http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtTipsAndTricksGuideRDFPerformanceTuning >>>> >>>> If not than your Linux kernel may start swapping out parts of your >>>> virtuoso >>>> process pages in favor of filesystem cache which will seriously hurt >>>> virtuoso's performance. >>>> >>>> In general you should make sure your system never starts swapping. >>>> >>>> >>>> Can you tell me the exact version of VOS you are using on your system and >>>> whether you are using the OS supplied version or if you compiled and >>>> installed it yourself. Note that the current version of Virtuoso >>>> OpenSource >>>> is 6.1.3 from: >>>> >>>> http://sourceforge.net/projects/virtuoso/files/ >>>> >>>> However if you are running an older version and are not afraid to do a >>>> build >>>> yourself, i would like to give you access to a prerelease of the upcoming >>>> 6.1.4 which has a number of new optimisations and fixes that maybe of >>>> benefit. >>>> >>>> >>>> Lastly on the subject of this dbpintegrator part, can you tell me the >>>> content of the file: >>>> >>>> lastDownloadDate.dat >>>> >>>> >>>> Patrick >>>> >>>> >>>> >> >> >> -- >> Kind Regards >> Mohamed Morsey >> Department of Computer Science >> University of Leipzig >> >> > ------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
