Re: BulkLoader error with large data and fast harddrive

jp Tue, 28 Jun 2011 16:02:37 -0700

The complete log file is over 13gb. I have posted the first 5000 lines
here http://www.kosmyna.com/ReportLoadOnSSD.log.5000lines
The run of tdbloader failed as well. first 5000 lines can be found
here http://www.kosmyna.com/tdbloader.log.5000lines


I could not run tdbloader2 I get the following error
./tdbloader2: line 14: make_classpath: command not found

I have TDBROOT environment variable correctly set and am using this
version of tdb
http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/tags/TDB-0.8.10/bin

-jp


On Tue, Jun 28, 2011 at 4:30 PM, Andy Seaborne
<[email protected]> wrote:
>> Aside from shipping you my laptop is there anything I can provide you
>> with to help track down the issue?
>
> A complete log, with the exception would help to identify the point where it
> fails.  Its a possible clue.
>
> Could you also try running tdbloader and tdbloader2 to bulk load the files?
>
>        Andy
>
>
> On 28/06/11 21:19, jp wrote:
>>
>> Hey Andy,
>>
>> Saw the twitter message 29% load speed increase is pretty nice. Glad I
>> could give you the excuse to upgrade :) Though It worries me that you
>> don't receive the same exception I do. I consistently have loading
>> issues using the file posted at
>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2. I can get
>> the test program to complete by making the following changes but it's
>> slow (30 minutes).
>>
>> SystemTDB.setFileMode(FileMode.direct) ;
>>
>>  if ( true ) {
>>      String dir = "/home/jp/scratch/ssdtest/DB-X" ;
>>      FileOps.clearDirectory(dir) ;
>>      datasetGraph = TDBFactory.createDatasetGraph(dir);
>>  }
>>
>> Running the program with the sections of code below fails every time.
>>
>> //SystemTDB.setFileMode(FileMode.direct) ;
>>
>>  if ( true ) {
>>      String dir = "/home/jp/scratch/ssdtest/DB-X" ;
>>      FileOps.clearDirectory(dir) ;
>>      datasetGraph = TDBFactory.createDatasetGraph(dir);
>>  }
>>
>> The exception:
>> java.lang.IllegalArgumentException
>>        at java.nio.Buffer.position(Buffer.java:235)
>>        at
>> com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:94)
>>        at
>> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:95)
>>        at
>> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:41)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.getSplitKey(BPTreeRecords.java:141)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.split(BPTreeNode.java:435)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:387)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289)
>>        at
>> com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48)
>>        at
>> com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49)
>>        at com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54)
>>        at
>> com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244)
>>        at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60)
>>        at org.openjena.riot.lang.LangBase.parse(LangBase.java:71)
>>        at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117)
>>        at
>> com.nimblegraph.data.bin.ReportLoadOnSSD.main(ReportLoadOnSSD.java:68)
>> http://dbpedia.org/resource/Spirea_X
>> http://dbpedia.org/ontology/associatedBand
>> http://dbpedia.org/resource/Adventures_in_Stereo
>>
>> If I continue to let it run I start seeing this error as well
>> com.hp.hpl.jena.tdb.TDBException: No known block type for 4
>>        at
>> com.hp.hpl.jena.tdb.base.block.BlockType.extract(BlockType.java:64)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.getType(BPTreeNodeMgr.java:166)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.access$200(BPTreeNodeMgr.java:22)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr$Block2BPTreeNode.fromByteBuffer(BPTreeNodeMgr.java:136)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNodeMgr.get(BPTreeNodeMgr.java:84)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(BPTreeNode.java:127)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:379)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297)
>>        at
>> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289)
>>        at
>> com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48)
>>        at
>> com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49)
>>        at com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54)
>>        at
>> com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244)
>>        at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60)
>>        at org.openjena.riot.lang.LangBase.parse(LangBase.java:71)
>>        at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159)
>>        at
>> com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117)
>>        at
>> com.nimblegraph.data.bin.ReportLoadOnSSD.main(ReportLoadOnSSD.java:68)
>>
>> Aside from shipping you my laptop is there anything I can provide you
>> with to help track down the issue? I am comfortable building tdb from
>> source and setting conditional breakpoints while debugging if that can
>> be of any benefit.
>>
>> Thanks for your help.
>> -jp
>>
>> On Tue, Jun 28, 2011 at 7:17 AM, Andy Seaborne
>> <[email protected]>  wrote:
>>>
>>> Hi there,
>>>
>>> I now have an SSD (256G from Crucial) :-)
>>>
>>> /dev/sdb1 on /mnt/ssd1 type ext4 (rw,noatime)
>>>
>>> and I ran the test program on jamendo-rdf and on
>>> mappingbased_properties_en.nt, then on jamendo-rdf with existing data as
>>> in
>>> the test case.
>>>
>>> Everything works for me - the loads complete without an exception.
>>>
>>>        Andy
>>>
>>> On 21/06/11 09:10, Andy Seaborne wrote:
>>>>
>>>>
>>>> On 21/06/11 06:01, jp wrote:
>>>>>
>>>>> Hey Andy
>>>>>
>>>>> I wasn't able to unzip the file
>>>>
>>>>> http://people.apache.org/~andy/jamendo.nt.gz however I ran it on my
>>>>> dataset and I received an out of memory exception. I then changed line
>>>>> 42 to true and received the original error. You can download the data
>>>>> file I have been testing with from
>>>>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2 unzipped it's
>>>>> 2.6gb. This file has consistently failed to load.
>>>>
>>>> downloads.dbpedia.org is back - I download that file and loaded it with
>>>> the test program - no problems.
>>>>
>>>>> While trying other datasets and variations of the simple program I had
>>>>> what seemed to be a successful BulkLoad however when I opened the
>>>>> dataset and tried to query it there were no results. I don't have the
>>>>> exact details of this run but can try to reproduce it if you think it
>>>>> would be useful.
>>>>
>>>> Yes please. At this point, any details a help
>>>>
>>>> Also, a complete log of the failed load of
>>>> mappingbased_properties_en.nt.bz2 would be useful.
>>>>
>>>> Having looked at the stacktraces, and aligned them to the source code,
>>>> it appears the code passes an internal consistency check, then fails on
>>>> something that the test tests for.
>>>>
>>>> Andy
>>>>
>>>>>
>>>>> -jp
>>>>>
>>>>>
>>>>> On Mon, Jun 20, 2011 at 4:57 PM, Andy Seaborne
>>>>> <[email protected]>  wrote:
>>>>>>
>>>>>> Fixed - sorry about that.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> On 20/06/11 21:50, jp wrote:
>>>>>>>
>>>>>>> Hey andy,
>>>>>>>
>>>>>>> I assume the file you want me to run is
>>>>>>> http://people.apache.org/~andy/ReportLoadOnSSD.java
>>>>>>>
>>>>>>> When I try to download it I get a permissions error. Let me know when
>>>>>>> I should try again.
>>>>>>>
>>>>>>> -jp
>>>>>>>
>>>>>>> On Mon, Jun 20, 2011 at 3:30 PM, Andy Seaborne
>>>>>>> <[email protected]>  wrote:
>>>>>>>>
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> I tried to recreate this but couldn't, but I don't have an SSD to
>>>>>>>> hand at
>>>>>>>> the moment (being fixed :-)
>>>>>>>>
>>>>>>>> I've put my test program and the data from the jamendo-rdf you sent
>>>>>>>> me
>>>>>>>> in:
>>>>>>>>
>>>>>>>> http://people.apache.org/~andy/
>>>>>>>>
>>>>>>>> so we can agree on exactly a test case. This code is single
>>>>>>>> threaded.
>>>>>>>>
>>>>>>>> The conversion from .rdf to .nt wasn't pure.
>>>>>>>>
>>>>>>>> I tried running using the in-memory store as well.
>>>>>>>> downloads.dbpedia.org was down atthe weekend - I'll try to get the
>>>>>>>> same
>>>>>>>> dbpedia data.
>>>>>>>>
>>>>>>>> Could you run exactly what I was running? The file name needs
>>>>>>>> changing.
>>>>>>>>
>>>>>>>> You can also try uncommenting
>>>>>>>> SystemTDB.setFileMode(FileMode.direct) ;
>>>>>>>> and run it using non-mapped files in about 1.2 G of heap.
>>>>>>>>
>>>>>>>> Looking through the stacktarce, there is a point where the code has
>>>>>>>> passed
>>>>>>>> an internal consistence test then fails with something that should
>>>>>>>> be
>>>>>>>> caught
>>>>>>>> by that test - and the code is sync'ed or single threaded. This is,
>>>>>>>> to
>>>>>>>> put
>>>>>>>> it mildly, worrying.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> On 18/06/11 16:38, jp wrote:
>>>>>>>>>
>>>>>>>>> Hey Andy,
>>>>>>>>>
>>>>>>>>> My entire program is run on one jvm as follows.
>>>>>>>>>
>>>>>>>>> public static void main(String[] args) throws IOException{
>>>>>>>>> DatasetGraphTDB datasetGraph =
>>>>>>>>> TDBFactory.createDatasetGraph(tdbDir);
>>>>>>>>>
>>>>>>>>> /* I saw the BulkLoader had two ways of loading data based on
>>>>>>>>> whether
>>>>>>>>> the dataset existed already. I did two runs one with the following
>>>>>>>>> two
>>>>>>>>> lines commented out to test both ways the BulkLoader runs.
>>>>>>>>> Hopefully
>>>>>>>>> this had the desired effect. */
>>>>>>>>> datasetGraph.getDefaultGraph().add(new
>>>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
>>>>>>>>> Node.createURI("urn:house")));
>>>>>>>>> datasetGraph.sync();
>>>>>>>>>
>>>>>>>>> InputStream inputStream = new FileInputStream(dbpediaData);
>>>>>>>>>
>>>>>>>>> BulkLoader bulkLoader = new BulkLoader();
>>>>>>>>> bulkLoader.loadDataset(datasetGraph, inputStream, true);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> The data can be found here
>>>>>>>>>
>>>>>>>>> http://downloads.dbpedia.org/3.6/en/mappingbased_properties_en.nt.bz2
>>>>>>>>> I appended the ontology to end of file it can be found here
>>>>>>>>> http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2
>>>>>>>>>
>>>>>>>>> The tdbDir is an empty directory.
>>>>>>>>> On my system the error starts occurring after about 2-3minutes and
>>>>>>>>> 8-12 million triples loaded.
>>>>>>>>>
>>>>>>>>> Thanks for looking over this and please let me know if I can be of
>>>>>>>>> further assistance.
>>>>>>>>>
>>>>>>>>> -jp
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jun 17, 2011 9:29 am, andy wrote:
>>>>>>>>>>
>>>>>>>>>> jp,
>>>>>>>>>>
>>>>>>>>>> How does this fit with running:
>>>>>>>>>>
>>>>>>>>>> datasetGraph.getDefaultGraph().add(new
>>>>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
>>>>>>>>>> Node.createURI("urn:house")));
>>>>>>>>>> datasetGraph.sync();
>>>>>>>>>>
>>>>>>>>>> Is the preload of one triple a separate JVM or the same JVM as the
>>>>>>>>>> BulkLoader call - could you provide a single complete minimal
>>>>>>>>>> example?
>>>>>>>>>>
>>>>>>>>>> In attempting to reconstruct this, I don't want to hide the
>>>>>>>>>> problem by
>>>>>>>>>> guessing how things are wired together.
>>>>>>>>>>
>>>>>>>>>> Also - exactly which dbpedia file are you loading (URL?) although
>>>>>>>>>> I
>>>>>>>>>> doubt the exact data is the cause here.
>>>>>>>>
>>>>>>
>>>
>

Re: BulkLoader error with large data and fast harddrive

Reply via email to