Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly large object

Paolo Castagna Wed, 28 Sep 2011 08:32:06 -0700

Hi,
I might be doing something stupid, but I think I produced a minimal
example which shows the problem.


I have two file: 1.ttl and 2.ttl.
Here they are:

---------[ 1.ttl ]---------
<http://example.com/1> <http://example.com/ns#p> <http://example.com/2> .
---------------------------

---------[ 2.ttl ]---------
<http://example.com/3> <http://example.com/ns#p> <http://example.com/4> .
---------------------------

This is what I do:

public static void dumpObjectFile(Location location) {
        ObjectFile objects = FileFactory.createObjectFileDisk(
                location.getPath(Names.indexId2Node, Names.extNodeData)) ;
        Iterator<Pair<Long,ByteBuffer>> iter = objects.all() ;
        while ( iter.hasNext() ) {
                System.out.println(iter.next()) ;
        }
}

public static void load(Location location, String filename) {
        StoreConnection sc = StoreConnection.make(location) ;
        DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ;
        TDBLoader.load(dsg, filename) ;
        dsg.commit() ;
        TDB.sync(dsg) ;
        dsg.close() ;
        StoreConnection.release(location) ;
}
        
public static void main(String[] args) {
        String path = "/home/castagna/Desktop/" ;
        Location location = new Location(path + "tdb") ;
        // 1
        load(location, path + "1.ttl") ;
        // 2
        // dumpObjectFile (location) ;
        // 3
        // replay(location, path + "2.ttl") ;
        // 4
        // dumpObjectFile (location) ;
}

I first load the first file (i.e. 1.ttl).
Then I comment step 1 and uncomment step 2: dumpObjectFile.

I then comment step 2 and uncomment step 3 to load the second file (i.e. 2.ttl).This time on an existing TDB location.Comment step 3, uncomment step 4 to dump the object file out again. This timethe nodes.dat file is corrupted.


I tried to use Model read(...) method instead of TDBLoader load(...).
The effect is the same.

I tried to remove the TDB.sync(dsg), since I don't thing it is necessary there.
The effect is the same.

Am I missing something obvious here?

Paolo


Simon Helsen wrote:

Paolo,
In our tests, we are not using TDBLoader.load directly. But we do usepublic Model add( Model m ) which in its turn callsgetBulkUpdateHandler().add( m.getGraph(), !suppressReifications );
Not sure if that helps in the analysis

Simon



From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 08:46 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly largeobject
Hi,
I am currently investigating the issue.
So far, I managed to get an initial copy of TDB indexes which is notcorrupted(~2.6GB). We then applied ~635 updates to it (and for each transaction Ihavethe data which has been submitted). I then re-applied the changes with alittleprogram which uses TxTDB only (via TDBLoader.load(...)). At the end ofthis, thenodes.dat file is corrupted.
This is just doing:
StoreConnection sc =StoreConnection.make(location) ;
                                 for ( int i = 1; i < 636; i++ ) {
                                                 System.out.println(i);
DatasetGraphTxn dsg =sc.begin(ReadWrite.WRITE) ;TDBLoader.load(dsg,"/tmp/updates/" + i + ".ttl") ;
                                                 dsg.commit() ;
                                                 dsg.close() ;
                                 }
I tried to apply same changes to an initially empty TDB database and thereareno problems.
Now, I am double checking the integrity of my initial TDB indexes.
I then proceed applying one change at the time and verify integrity (viadump).
Paolo



Simon Helsen wrote:
thanks Paolo,

this is related to jena-91. In fact, that is how our problems started

Glad someone else was able to reproduce

Simon



From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 06:47 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly
large
object



The object file of the node table (i.e. nodes.dat) is corrupted.

I tried to read it sequentially, I get:
(318670, java.nio.HeapByteBuffer[pos=0 lim=22 cap=22])
But, after that, the length of the next ByteBuffer is: 909129782 (*).

Paolo
(*) Running a simple program to iterate through all the Pair<Long,ByteBuffer>in the ObjectFile and debugging it: ObjectFileDiskDirect, line
176.
Paolo Castagna wrote:
Hi,
we are using|testing TxTDB.
In this case, we just perform a series of WRITE transactions
(sequentially
one after the other) and then issue a SPARQL query (as a READ
transaction).
There are no exceptions during the WRITE transactions.

This is the exception we see when we issue the SPARQL query:
com.hp.hpl.jena.tdb.base.file.FileException:ObjectFile.read(9863)[119398665][119079969]: Impossibly large object :1752462448 bytesat
com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:282)
    at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:164)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:88)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:59)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:89)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:60)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:56)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
    at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:92)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:106)
atcom.hp.hpl.jena.sparql.core.ResultBinding._get(ResultBinding.java:44)at
com.hp.hpl.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:20)
at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:35)
atcom.hp.hpl.jena.sparql.resultset.JSONOutput.format(JSONOutput.java:23)at
com.hp.hpl.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:584)
    [...]

This was with an Oracle JVM, 1.6.0_25 64-bit on an VM (on EC2) with
Ubuntu 64-bit OS. We are using a TxTDB packaged directly from SVN(r1176416).
This seems to be a similar (or related) issue to:
https://issues.apache.org/jira/browse/JENA-91

Paolo

Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly large object

Reply via email to