Hi Andy, interesting and thanks for sharing info. It would be interesting to know load performances using tdbloader|tdbloader2 with your new SSD.
Paolo Andy Seaborne wrote: > Hi there, > > I now have an SSD (256G from Crucial) :-) > > /dev/sdb1 on /mnt/ssd1 type ext4 (rw,noatime) > > and I ran the test program on jamendo-rdf and on > mappingbased_properties_en.nt, then on jamendo-rdf with existing data as > in the test case. > > Everything works for me - the loads complete without an exception. > > Andy > > On 21/06/11 09:10, Andy Seaborne wrote: >> >> >> On 21/06/11 06:01, jp wrote: >>> Hey Andy >>> >>> I wasn't able to unzip the file >> >>> http://people.apache.org/~andy/jamendo.nt.gz however I ran it on my >>> dataset and I received an out of memory exception. I then changed line >>> 42 to true and received the original error. You can download the data >>> file I have been testing with from >>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2 unzipped it's >>> 2.6gb. This file has consistently failed to load. >> >> downloads.dbpedia.org is back - I download that file and loaded it with >> the test program - no problems. >> >>> While trying other datasets and variations of the simple program I had >>> what seemed to be a successful BulkLoad however when I opened the >>> dataset and tried to query it there were no results. I don't have the >>> exact details of this run but can try to reproduce it if you think it >>> would be useful. >> >> Yes please. At this point, any details a help >> >> Also, a complete log of the failed load of >> mappingbased_properties_en.nt.bz2 would be useful. >> >> Having looked at the stacktraces, and aligned them to the source code, >> it appears the code passes an internal consistency check, then fails on >> something that the test tests for. >> >> Andy >> >>> >>> -jp >>> >>> >>> On Mon, Jun 20, 2011 at 4:57 PM, Andy Seaborne >>> <[email protected]> wrote: >>>> Fixed - sorry about that. >>>> >>>> Andy >>>> >>>> On 20/06/11 21:50, jp wrote: >>>>> >>>>> Hey andy, >>>>> >>>>> I assume the file you want me to run is >>>>> http://people.apache.org/~andy/ReportLoadOnSSD.java >>>>> >>>>> When I try to download it I get a permissions error. Let me know when >>>>> I should try again. >>>>> >>>>> -jp >>>>> >>>>> On Mon, Jun 20, 2011 at 3:30 PM, Andy Seaborne >>>>> <[email protected]> wrote: >>>>>> >>>>>> Hi there, >>>>>> >>>>>> I tried to recreate this but couldn't, but I don't have an SSD to >>>>>> hand at >>>>>> the moment (being fixed :-) >>>>>> >>>>>> I've put my test program and the data from the jamendo-rdf you >>>>>> sent me >>>>>> in: >>>>>> >>>>>> http://people.apache.org/~andy/ >>>>>> >>>>>> so we can agree on exactly a test case. This code is single threaded. >>>>>> >>>>>> The conversion from .rdf to .nt wasn't pure. >>>>>> >>>>>> I tried running using the in-memory store as well. >>>>>> downloads.dbpedia.org was down atthe weekend - I'll try to get the >>>>>> same >>>>>> dbpedia data. >>>>>> >>>>>> Could you run exactly what I was running? The file name needs >>>>>> changing. >>>>>> >>>>>> You can also try uncommenting >>>>>> SystemTDB.setFileMode(FileMode.direct) ; >>>>>> and run it using non-mapped files in about 1.2 G of heap. >>>>>> >>>>>> Looking through the stacktarce, there is a point where the code has >>>>>> passed >>>>>> an internal consistence test then fails with something that should be >>>>>> caught >>>>>> by that test - and the code is sync'ed or single threaded. This >>>>>> is, to >>>>>> put >>>>>> it mildly, worrying. >>>>>> >>>>>> Andy >>>>>> >>>>>> On 18/06/11 16:38, jp wrote: >>>>>>> >>>>>>> Hey Andy, >>>>>>> >>>>>>> My entire program is run on one jvm as follows. >>>>>>> >>>>>>> public static void main(String[] args) throws IOException{ >>>>>>> DatasetGraphTDB datasetGraph = >>>>>>> TDBFactory.createDatasetGraph(tdbDir); >>>>>>> >>>>>>> /* I saw the BulkLoader had two ways of loading data based on >>>>>>> whether >>>>>>> the dataset existed already. I did two runs one with the following >>>>>>> two >>>>>>> lines commented out to test both ways the BulkLoader runs. Hopefully >>>>>>> this had the desired effect. */ >>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>> Node.createURI("urn:house"))); >>>>>>> datasetGraph.sync(); >>>>>>> >>>>>>> InputStream inputStream = new FileInputStream(dbpediaData); >>>>>>> >>>>>>> BulkLoader bulkLoader = new BulkLoader(); >>>>>>> bulkLoader.loadDataset(datasetGraph, inputStream, true); >>>>>>> } >>>>>>> >>>>>>> The data can be found here >>>>>>> http://downloads.dbpedia.org/3.6/en/mappingbased_properties_en.nt.bz2 >>>>>>> >>>>>>> I appended the ontology to end of file it can be found here >>>>>>> http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2 >>>>>>> >>>>>>> The tdbDir is an empty directory. >>>>>>> On my system the error starts occurring after about 2-3minutes and >>>>>>> 8-12 million triples loaded. >>>>>>> >>>>>>> Thanks for looking over this and please let me know if I can be of >>>>>>> further assistance. >>>>>>> >>>>>>> -jp >>>>>>> [email protected] >>>>>>> >>>>>>> >>>>>>> On Jun 17, 2011 9:29 am, andy wrote: >>>>>>>> >>>>>>>> jp, >>>>>>>> >>>>>>>> How does this fit with running: >>>>>>>> >>>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>>> Node.createURI("urn:house"))); >>>>>>>> datasetGraph.sync(); >>>>>>>> >>>>>>>> Is the preload of one triple a separate JVM or the same JVM as the >>>>>>>> BulkLoader call - could you provide a single complete minimal >>>>>>>> example? >>>>>>>> >>>>>>>> In attempting to reconstruct this, I don't want to hide the >>>>>>>> problem by >>>>>>>> guessing how things are wired together. >>>>>>>> >>>>>>>> Also - exactly which dbpedia file are you loading (URL?) although I >>>>>>>> doubt the exact data is the cause here. >>>>>> >>>>
