Paolo Castagna wrote:
Hi,
I might be doing something stupid, but I think I produced a minimal
example which shows the problem.

I have two file: 1.ttl and 2.ttl.
Here they are:

---------[ 1.ttl ]---------
<http://example.com/1> <http://example.com/ns#p> <http://example.com/2> .
---------------------------

---------[ 2.ttl ]---------
<http://example.com/3> <http://example.com/ns#p> <http://example.com/4> .
---------------------------

This is what I do:

public static void dumpObjectFile(Location location) {
    ObjectFile objects = FileFactory.createObjectFileDisk(
        location.getPath(Names.indexId2Node, Names.extNodeData)) ;
    Iterator<Pair<Long,ByteBuffer>> iter = objects.all() ;
    while ( iter.hasNext() ) {
        System.out.println(iter.next()) ;
    }
}

public static void load(Location location, String filename) {
    StoreConnection sc = StoreConnection.make(location) ;
    DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ;
    TDBLoader.load(dsg, filename) ;
    dsg.commit() ;
    TDB.sync(dsg) ;
    dsg.close() ;
    StoreConnection.release(location) ;
}
public static void main(String[] args) {
    String path = "/home/castagna/Desktop/" ;
    Location location = new Location(path + "tdb") ;
    // 1
    load(location, path + "1.ttl") ;
    // 2
    // dumpObjectFile (location) ;
    // 3
    // replay(location, path + "2.ttl") ;
           ^
           |
          load

    // 4
    // dumpObjectFile (location) ;
}

I first load the first file (i.e. 1.ttl).
Then I comment step 1 and uncomment step 2: dumpObjectFile.
I then comment step 2 and uncomment step 3 to load the second file (i.e. 2.ttl). This time on an existing TDB location. Comment step 3, uncomment step 4 to dump the object file out again. This time the nodes.dat file is corrupted.

I tried to use Model read(...) method instead of TDBLoader load(...).
The effect is the same.

I tried to remove the TDB.sync(dsg), since I don't thing it is necessary there.
The effect is the same.

Am I missing something obvious here?

Paolo


Simon Helsen wrote:
Paolo,

In our tests, we are not using TDBLoader.load directly. But we do use public Model add( Model m ) which in its turn calls getBulkUpdateHandler().add( m.getGraph(), !suppressReifications );

Not sure if that helps in the analysis

Simon



From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 08:46 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly large object



Hi,
I am currently investigating the issue.

So far, I managed to get an initial copy of TDB indexes which is not corrupted (~2.6GB). We then applied ~635 updates to it (and for each transaction I have the data which has been submitted). I then re-applied the changes with a little program which uses TxTDB only (via TDBLoader.load(...)). At the end of this, the nodes.dat file is corrupted.

This is just doing:

StoreConnection sc = StoreConnection.make(location) ;
                                 for ( int i = 1; i < 636; i++ ) {
                                                 System.out.println(i);
DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ; TDBLoader.load(dsg, "/tmp/updates/" + i + ".ttl") ;
                                                 dsg.commit() ;
                                                 dsg.close() ;
                                 }

I tried to apply same changes to an initially empty TDB database and there are no problems.

Now, I am double checking the integrity of my initial TDB indexes.
I then proceed applying one change at the time and verify integrity (via dump).

Paolo



Simon Helsen wrote:
thanks Paolo,

this is related to jena-91. In fact, that is how our problems started

Glad someone else was able to reproduce

Simon



From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 06:47 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly
large
object



The object file of the node table (i.e. nodes.dat) is corrupted.

I tried to read it sequentially, I get:
(318670, java.nio.HeapByteBuffer[pos=0 lim=22 cap=22])
But, after that, the length of the next ByteBuffer is: 909129782 (*).

Paolo

(*) Running a simple program to iterate through all the Pair<Long, ByteBuffer> in the ObjectFile and debugging it: ObjectFileDiskDirect, line
176.

Paolo Castagna wrote:
Hi,
we are using|testing TxTDB.

In this case, we just perform a series of WRITE transactions
(sequentially
one after the other) and then issue a SPARQL query (as a READ
transaction).
There are no exceptions during the WRITE transactions.

This is the exception we see when we issue the SPARQL query:

com.hp.hpl.jena.tdb.base.file.FileException: ObjectFile.read(9863)[119398665][119079969]: Impossibly large object : 1752462448 bytes
    at
com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:282)
    at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:164)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:88)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:59)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:89)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:60)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:56)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
    at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:92)
    at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:106)
at com.hp.hpl.jena.sparql.core.ResultBinding._get(ResultBinding.java:44)
    at
com.hp.hpl.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:20)
    at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:35)
at com.hp.hpl.jena.sparql.resultset.JSONOutput.format(JSONOutput.java:23)
    at
com.hp.hpl.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:584)
    [...]

This was with an Oracle JVM, 1.6.0_25 64-bit on an VM (on EC2) with
Ubuntu 64-bit OS. We are using a TxTDB packaged directly from SVN (r1176416).

This seems to be a similar (or related) issue to:
https://issues.apache.org/jira/browse/JENA-91

Paolo











Reply via email to