tdbloader2 was able to load the file. Log can be found here http://www.kosmyna.com/tdbloader2.log I guess the questions now are What's the difference between tdbloader2 and the test application? and Why does tdbloader fail?
-jp On Tue, Jun 28, 2011 at 7:21 PM, jp <[email protected]> wrote: > Sorry for any confusion tdbloader2 is working find I had a typo in my > $PATH variable. I'll post results of the load asap. > > -jp > > On Tue, Jun 28, 2011 at 7:02 PM, jp <[email protected]> wrote: >> The complete log file is over 13gb. I have posted the first 5000 lines >> here http://www.kosmyna.com/ReportLoadOnSSD.log.5000lines >> The run of tdbloader failed as well. first 5000 lines can be found >> here http://www.kosmyna.com/tdbloader.log.5000lines >> >> I could not run tdbloader2 I get the following error >> ./tdbloader2: line 14: make_classpath: command not found >> >> I have TDBROOT environment variable correctly set and am using this >> version of tdb >> http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/tags/TDB-0.8.10/bin >> >> -jp >> >> >> On Tue, Jun 28, 2011 at 4:30 PM, Andy Seaborne >> <[email protected]> wrote: >>>> Aside from shipping you my laptop is there anything I can provide you >>>> with to help track down the issue? >>> >>> A complete log, with the exception would help to identify the point where it >>> fails. Its a possible clue. >>> >>> Could you also try running tdbloader and tdbloader2 to bulk load the files? >>> >>> Andy >>> >>> >>> On 28/06/11 21:19, jp wrote: >>>> >>>> Hey Andy, >>>> >>>> Saw the twitter message 29% load speed increase is pretty nice. Glad I >>>> could give you the excuse to upgrade :) Though It worries me that you >>>> don't receive the same exception I do. I consistently have loading >>>> issues using the file posted at >>>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2. I can get >>>> the test program to complete by making the following changes but it's >>>> slow (30 minutes). >>>> >>>> SystemTDB.setFileMode(FileMode.direct) ; >>>> >>>> if ( true ) { >>>> String dir = "/home/jp/scratch/ssdtest/DB-X" ; >>>> FileOps.clearDirectory(dir) ; >>>> datasetGraph = TDBFactory.createDatasetGraph(dir); >>>> } >>>> >>>> Running the program with the sections of code below fails every time. >>>> >>>> //SystemTDB.setFileMode(FileMode.direct) ; >>>> >>>> if ( true ) { >>>> String dir = "/home/jp/scratch/ssdtest/DB-X" ; >>>> FileOps.clearDirectory(dir) ; >>>> datasetGraph = TDBFactory.createDatasetGraph(dir); >>>> } >>>> >>>> The exception: >>>> java.lang.IllegalArgumentException >>>> at java.nio.Buffer.position(Buffer.java:235) >>>> at >>>> com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:94) >>>> at >>>> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:95) >>>> at >>>> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:41) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.getSplitKey(BPTreeRecords.java:141) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.split(BPTreeNode.java:435) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:387) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289) >>>> at >>>> com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48) >>>> at >>>> com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49) >>>> at com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54) >>>> at >>>> com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244) >>>> at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60) >>>> at org.openjena.riot.lang.LangBase.parse(LangBase.java:71) >>>> at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117) >>>> at >>>> com.nimblegraph.data.bin.ReportLoadOnSSD.main(ReportLoadOnSSD.java:68) >>>> http://dbpedia.org/resource/Spirea_X >>>> http://dbpedia.org/ontology/associatedBand >>>> http://dbpedia.org/resource/Adventures_in_Stereo >>>> >>>> If I continue to let it run I start seeing this error as well >>>> com.hp.hpl.jena.tdb.TDBException: No known block type for 4 >>>> at >>>> com.hp.hpl.jena.tdb.base.block.BlockType.extract(BlockType.java:64) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.getType(BPTreeNodeMgr.java:166) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.access$200(BPTreeNodeMgr.java:22) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr$Block2BPTreeNode.fromByteBuffer(BPTreeNodeMgr.java:136) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.get(BPTreeNodeMgr.java:84) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(BPTreeNode.java:127) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:379) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297) >>>> at >>>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289) >>>> at >>>> com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48) >>>> at >>>> com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49) >>>> at com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54) >>>> at >>>> com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244) >>>> at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60) >>>> at org.openjena.riot.lang.LangBase.parse(LangBase.java:71) >>>> at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159) >>>> at >>>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117) >>>> at >>>> com.nimblegraph.data.bin.ReportLoadOnSSD.main(ReportLoadOnSSD.java:68) >>>> >>>> Aside from shipping you my laptop is there anything I can provide you >>>> with to help track down the issue? I am comfortable building tdb from >>>> source and setting conditional breakpoints while debugging if that can >>>> be of any benefit. >>>> >>>> Thanks for your help. >>>> -jp >>>> >>>> On Tue, Jun 28, 2011 at 7:17 AM, Andy Seaborne >>>> <[email protected]> wrote: >>>>> >>>>> Hi there, >>>>> >>>>> I now have an SSD (256G from Crucial) :-) >>>>> >>>>> /dev/sdb1 on /mnt/ssd1 type ext4 (rw,noatime) >>>>> >>>>> and I ran the test program on jamendo-rdf and on >>>>> mappingbased_properties_en.nt, then on jamendo-rdf with existing data as >>>>> in >>>>> the test case. >>>>> >>>>> Everything works for me - the loads complete without an exception. >>>>> >>>>> Andy >>>>> >>>>> On 21/06/11 09:10, Andy Seaborne wrote: >>>>>> >>>>>> >>>>>> On 21/06/11 06:01, jp wrote: >>>>>>> >>>>>>> Hey Andy >>>>>>> >>>>>>> I wasn't able to unzip the file >>>>>> >>>>>>> http://people.apache.org/~andy/jamendo.nt.gz however I ran it on my >>>>>>> dataset and I received an out of memory exception. I then changed line >>>>>>> 42 to true and received the original error. You can download the data >>>>>>> file I have been testing with from >>>>>>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2 unzipped it's >>>>>>> 2.6gb. This file has consistently failed to load. >>>>>> >>>>>> downloads.dbpedia.org is back - I download that file and loaded it with >>>>>> the test program - no problems. >>>>>> >>>>>>> While trying other datasets and variations of the simple program I had >>>>>>> what seemed to be a successful BulkLoad however when I opened the >>>>>>> dataset and tried to query it there were no results. I don't have the >>>>>>> exact details of this run but can try to reproduce it if you think it >>>>>>> would be useful. >>>>>> >>>>>> Yes please. At this point, any details a help >>>>>> >>>>>> Also, a complete log of the failed load of >>>>>> mappingbased_properties_en.nt.bz2 would be useful. >>>>>> >>>>>> Having looked at the stacktraces, and aligned them to the source code, >>>>>> it appears the code passes an internal consistency check, then fails on >>>>>> something that the test tests for. >>>>>> >>>>>> Andy >>>>>> >>>>>>> >>>>>>> -jp >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 20, 2011 at 4:57 PM, Andy Seaborne >>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> Fixed - sorry about that. >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On 20/06/11 21:50, jp wrote: >>>>>>>>> >>>>>>>>> Hey andy, >>>>>>>>> >>>>>>>>> I assume the file you want me to run is >>>>>>>>> http://people.apache.org/~andy/ReportLoadOnSSD.java >>>>>>>>> >>>>>>>>> When I try to download it I get a permissions error. Let me know when >>>>>>>>> I should try again. >>>>>>>>> >>>>>>>>> -jp >>>>>>>>> >>>>>>>>> On Mon, Jun 20, 2011 at 3:30 PM, Andy Seaborne >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Hi there, >>>>>>>>>> >>>>>>>>>> I tried to recreate this but couldn't, but I don't have an SSD to >>>>>>>>>> hand at >>>>>>>>>> the moment (being fixed :-) >>>>>>>>>> >>>>>>>>>> I've put my test program and the data from the jamendo-rdf you sent >>>>>>>>>> me >>>>>>>>>> in: >>>>>>>>>> >>>>>>>>>> http://people.apache.org/~andy/ >>>>>>>>>> >>>>>>>>>> so we can agree on exactly a test case. This code is single >>>>>>>>>> threaded. >>>>>>>>>> >>>>>>>>>> The conversion from .rdf to .nt wasn't pure. >>>>>>>>>> >>>>>>>>>> I tried running using the in-memory store as well. >>>>>>>>>> downloads.dbpedia.org was down atthe weekend - I'll try to get the >>>>>>>>>> same >>>>>>>>>> dbpedia data. >>>>>>>>>> >>>>>>>>>> Could you run exactly what I was running? The file name needs >>>>>>>>>> changing. >>>>>>>>>> >>>>>>>>>> You can also try uncommenting >>>>>>>>>> SystemTDB.setFileMode(FileMode.direct) ; >>>>>>>>>> and run it using non-mapped files in about 1.2 G of heap. >>>>>>>>>> >>>>>>>>>> Looking through the stacktarce, there is a point where the code has >>>>>>>>>> passed >>>>>>>>>> an internal consistence test then fails with something that should >>>>>>>>>> be >>>>>>>>>> caught >>>>>>>>>> by that test - and the code is sync'ed or single threaded. This is, >>>>>>>>>> to >>>>>>>>>> put >>>>>>>>>> it mildly, worrying. >>>>>>>>>> >>>>>>>>>> Andy >>>>>>>>>> >>>>>>>>>> On 18/06/11 16:38, jp wrote: >>>>>>>>>>> >>>>>>>>>>> Hey Andy, >>>>>>>>>>> >>>>>>>>>>> My entire program is run on one jvm as follows. >>>>>>>>>>> >>>>>>>>>>> public static void main(String[] args) throws IOException{ >>>>>>>>>>> DatasetGraphTDB datasetGraph = >>>>>>>>>>> TDBFactory.createDatasetGraph(tdbDir); >>>>>>>>>>> >>>>>>>>>>> /* I saw the BulkLoader had two ways of loading data based on >>>>>>>>>>> whether >>>>>>>>>>> the dataset existed already. I did two runs one with the following >>>>>>>>>>> two >>>>>>>>>>> lines commented out to test both ways the BulkLoader runs. >>>>>>>>>>> Hopefully >>>>>>>>>>> this had the desired effect. */ >>>>>>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>>>>>> Node.createURI("urn:house"))); >>>>>>>>>>> datasetGraph.sync(); >>>>>>>>>>> >>>>>>>>>>> InputStream inputStream = new FileInputStream(dbpediaData); >>>>>>>>>>> >>>>>>>>>>> BulkLoader bulkLoader = new BulkLoader(); >>>>>>>>>>> bulkLoader.loadDataset(datasetGraph, inputStream, true); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> The data can be found here >>>>>>>>>>> >>>>>>>>>>> http://downloads.dbpedia.org/3.6/en/mappingbased_properties_en.nt.bz2 >>>>>>>>>>> I appended the ontology to end of file it can be found here >>>>>>>>>>> http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2 >>>>>>>>>>> >>>>>>>>>>> The tdbDir is an empty directory. >>>>>>>>>>> On my system the error starts occurring after about 2-3minutes and >>>>>>>>>>> 8-12 million triples loaded. >>>>>>>>>>> >>>>>>>>>>> Thanks for looking over this and please let me know if I can be of >>>>>>>>>>> further assistance. >>>>>>>>>>> >>>>>>>>>>> -jp >>>>>>>>>>> [email protected] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jun 17, 2011 9:29 am, andy wrote: >>>>>>>>>>>> >>>>>>>>>>>> jp, >>>>>>>>>>>> >>>>>>>>>>>> How does this fit with running: >>>>>>>>>>>> >>>>>>>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>>>>>>> Node.createURI("urn:house"))); >>>>>>>>>>>> datasetGraph.sync(); >>>>>>>>>>>> >>>>>>>>>>>> Is the preload of one triple a separate JVM or the same JVM as the >>>>>>>>>>>> BulkLoader call - could you provide a single complete minimal >>>>>>>>>>>> example? >>>>>>>>>>>> >>>>>>>>>>>> In attempting to reconstruct this, I don't want to hide the >>>>>>>>>>>> problem by >>>>>>>>>>>> guessing how things are wired together. >>>>>>>>>>>> >>>>>>>>>>>> Also - exactly which dbpedia file are you loading (URL?) although >>>>>>>>>>>> I >>>>>>>>>>>> doubt the exact data is the cause here. >>>>>>>>>> >>>>>>>> >>>>> >>> >> >
