Hi Karel, On 09/19/2011 11:33 AM, karel braeckman wrote: > Hi Guys, > > I have let the dbpintegrator run over the weekend, but it got stuck > again (same behavior as mentioned before: Virtuoso stuck at 100% CPU > use). The good news is we think we found the problem. > > We found something strange in the update file with triples to delete > where things got stuck. The file uses variables, but some of the > variables are used more than once. For instance, the file > 000981.removed.nt uses the ?o0 and ?o1 variables twice: > > <http://dbpedia.org/resource/Gaziantep> > <http://www.w3.org/2000/01/rdf-schema#comment> ?o0 . > <http://dbpedia.org/resource/Public_domain> > <http://www.w3.org/2000/01/rdf-schema#comment> ?o0 . > <http://dbpedia.org/resource/Gaziantep> > <http://dbpedia.org/ontology/abstract> ?o1 . > <http://dbpedia.org/resource/Public_domain> > <http://dbpedia.org/ontology/abstract> ?o1 .
I got the problem. > This is probably the reason why the DELETE query for this file has a > higher complexity and tripped Virtuoso up. We changed the code of > dbpintegrator slightly to perform a Sparql query per line of this file > instead of one query for the entire file. The tool is running without > any problems so far. This is a good solution, and I'll check the problem in the extraction framework itself and use incremental variable numbers, so the number do not coincide. > So perhaps something is going wrong in creating the files with the > deletion triples? > Best regards, > Karel > > On Fri, Sep 16, 2011 at 5:53 PM, karel braeckman > <[email protected]> wrote: >> Hi Mohamed, >> >> You were right, I checked my settings and the tool does start at the >> correct date. I must have done something wrong earlier. >> >> Best regards, >> Karel >> >> On Fri, Sep 16, 2011 at 5:26 PM, Mohamed Morsey >> <[email protected]> wrote: >>> Hi Karel, >>> >>> On 09/16/2011 04:51 PM, karel braeckman wrote: >>>> Hi guys, >>>> >>>> First of all, thanks for all the suggestions. I changed my settings >>>> according to your suggestions, and at first the problem was the same >>>> (100% CPU for quite a while when deleting triples) but after a while >>>> (~10 minutes) Virtuoso completed the action and now the tool seems to >>>> be running ok. I'm afraid I don't know which of the settings did >>>> finally got it to work. >>> Nice to hear that :). >>> >>>> I have one more problem with the dbpintegrator tool however. I set the >>>> date in my lastDownloadDate.dat to 2011-09-10-00-000000, but the tool >>>> seems to start at the first file of the current hour >>>> (2011-09-16-16-000001), could this be a bug? >>> I've tried the dbpintegrator tool on my machine starting from the point you >>> mention and it seems to work properly, so please recheck your settings and >>> let me know if the problem still exists >>> >>>> @Mohamed: >>>> It really did take two days to fill the store with the DBpedia live >>>> dump. Initially it was fast, but it got slower and slower. There >>>> already was the default DBpedia dump (not the live version) inserted >>>> into another graph, maybe the amount of triples is just too large? How >>>> fast should it (more or less) take to load the live dump into Virtuoso >>>> you think? >>> Not 100% sure but it should take something like 3-4 hours. >>> >>>> Virtuoso was running for a few weeks the first time I tried to run the >>>> sync tool. Since then, I restarted it a few times after changing >>>> config files and trying to debug things. >>> Exactly, this what I meant, a restart could be helpful. >>> >>>> @Patrick, @Kingsley: >>>> The version I am using is Version: 06.01.3127, Build: Mar 16 2011 of >>>> VOS. I downloaded and compiled it (on Ubuntu 10.04.2 LTS). >>>> >>>> The lastDownloadDate.dat file contains 2011-09-16-16-000555 at the >>>> moment of writing (the tool is working now). >>>> >>>> Best regards and thanks for the hints, >>>> Karel >>>> >>>> On Fri, Sep 16, 2011 at 3:53 PM, Patrick van Kleef >>>> <[email protected]> wrote: >>>>> Hi Karel, >>>>> >>>>>> Forgot to mention the machine details: >>>>>> >>>>>> 24GB RAM >>>>>> 2x quadcore Xeon E5540 2.5GHz >>>>>> Virtuoso data is on SSD disks >>>>> Your parameters look ok, but you may want to try adding the following: >>>>> >>>>> [Parameters] >>>>> ... >>>>> DefaultIsolation = 2 >>>>> ... >>>>> >>>>> which sets a different transaction isolation level which is more suitable >>>>> for situation where updates/deletes and queries are done on the same >>>>> server. >>>>> >>>>> >>>>> Did you also set your linux kernel swappiness parameter as per the >>>>> following >>>>> Tips and Tricks article: >>>>> >>>>> >>>>> http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtTipsAndTricksGuideRDFPerformanceTuning >>>>> >>>>> If not than your Linux kernel may start swapping out parts of your >>>>> virtuoso >>>>> process pages in favor of filesystem cache which will seriously hurt >>>>> virtuoso's performance. >>>>> >>>>> In general you should make sure your system never starts swapping. >>>>> >>>>> >>>>> Can you tell me the exact version of VOS you are using on your system and >>>>> whether you are using the OS supplied version or if you compiled and >>>>> installed it yourself. Note that the current version of Virtuoso >>>>> OpenSource >>>>> is 6.1.3 from: >>>>> >>>>> http://sourceforge.net/projects/virtuoso/files/ >>>>> >>>>> However if you are running an older version and are not afraid to do a >>>>> build >>>>> yourself, i would like to give you access to a prerelease of the upcoming >>>>> 6.1.4 which has a number of new optimisations and fixes that maybe of >>>>> benefit. >>>>> >>>>> >>>>> Lastly on the subject of this dbpintegrator part, can you tell me the >>>>> content of the file: >>>>> >>>>> lastDownloadDate.dat >>>>> >>>>> >>>>> Patrick >>>>> >>>>> >>>>> >>> >>> -- >>> Kind Regards >>> Mohamed Morsey >>> Department of Computer Science >>> University of Leipzig >>> >>> > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > Learn about the latest advances in developing for the > BlackBerry® mobile platform with sessions, labs& more. > See new tools and technologies. Register for BlackBerry® DevCon today! > http://p.sf.net/sfu/rim-devcon-copy1 > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > -- Kind Regards Mohamed Morsey Department of Computer Science University of Leipzig ------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
