Re: BulkLoader error with large data and fast harddrive

jp Mon, 20 Jun 2011 13:50:32 -0700

Hey andy,

I assume the file you want me to run is
http://people.apache.org/~andy/ReportLoadOnSSD.java


When I try to download it I get a permissions error. Let me know when
I should try again.

-jp

On Mon, Jun 20, 2011 at 3:30 PM, Andy Seaborne
<[email protected]> wrote:
> Hi there,
>
> I tried to recreate this but couldn't, but I don't have an SSD to hand at
> the moment (being fixed :-)
>
> I've put my test program and the data from the jamendo-rdf you sent me in:
>
> http://people.apache.org/~andy/
>
> so we can agree on exactly a test case.  This code is single threaded.
>
> The conversion from .rdf to .nt wasn't pure.
>
> I tried running using the in-memory store as well.
> downloads.dbpedia.org was down atthe weekend - I'll try to get the same
> dbpedia data.
>
> Could you run exactly what I was running?  The file name needs changing.
>
> You can also try uncommenting
>   SystemTDB.setFileMode(FileMode.direct) ;
> and run it using non-mapped files in about 1.2 G of heap.
>
> Looking through the stacktarce, there is a point where the code has passed
> an internal consistence test then fails with something that should be caught
> by that test - and the code is sync'ed or single threaded.  This is, to put
> it mildly, worrying.
>
>        Andy
>
> On 18/06/11 16:38, jp wrote:
>>
>> Hey Andy,
>>
>> My entire program is run on one jvm as follows.
>>
>> public static void main(String[] args) throws IOException{
>>   DatasetGraphTDB datasetGraph = TDBFactory.createDatasetGraph(tdbDir);
>>
>> /* I saw the BulkLoader had two ways of loading data based on whether
>> the dataset existed already. I did two runs one with the following two
>> lines commented out to test both ways the BulkLoader runs. Hopefully
>> this had the desired effect. */
>>   datasetGraph.getDefaultGraph().add(new
>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
>> Node.createURI("urn:house")));
>>   datasetGraph.sync();
>>
>>   InputStream inputStream = new FileInputStream(dbpediaData);
>>
>>   BulkLoader bulkLoader = new BulkLoader();
>>   bulkLoader.loadDataset(datasetGraph, inputStream, true);
>> }
>>
>> The data can be found here
>> http://downloads.dbpedia.org/3.6/en/mappingbased_properties_en.nt.bz2
>> I appended the ontology to end of file it can be found here
>> http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2
>>
>> The tdbDir is an empty directory.
>> On my system the error starts occurring after about 2-3minutes and
>> 8-12 million triples loaded.
>>
>> Thanks for looking over this and please let me know if I can be of
>> further assistance.
>>
>> -jp
>> [email protected]
>>
>>
>> On Jun 17, 2011 9:29 am, andy wrote:
>>>
>>> jp,
>>>
>>> How does this fit with running:
>>>
>>> datasetGraph.getDefaultGraph().add(new
>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(),
>>> Node.createURI("urn:house")));
>>> datasetGraph.sync();
>>>
>>> Is the preload of one triple a separate JVM or the same JVM as the
>>> BulkLoader call - could you provide a single complete minimal example?
>>>
>>> In attempting to reconstruct this, I don't want to hide the problem by
>>> guessing how things are wired together.
>>>
>>> Also - exactly which dbpedia file are you loading (URL?) although I
>>> doubt the exact data is the cause here.
>

Re: BulkLoader error with large data and fast harddrive

Reply via email to