Hi,
I might be doing something stupid, but I think I produced a minimal
example which shows the problem.
I have two file: 1.ttl and 2.ttl.
Here they are:
---------[ 1.ttl ]---------
<http://example.com/1> <http://example.com/ns#p> <http://example.com/2> .
---------------------------
---------[ 2.ttl ]---------
<http://example.com/3> <http://example.com/ns#p> <http://example.com/4> .
---------------------------
This is what I do:
public static void dumpObjectFile(Location location) {
ObjectFile objects = FileFactory.createObjectFileDisk(
location.getPath(Names.indexId2Node, Names.extNodeData)) ;
Iterator<Pair<Long,ByteBuffer>> iter = objects.all() ;
while ( iter.hasNext() ) {
System.out.println(iter.next()) ;
}
}
public static void load(Location location, String filename) {
StoreConnection sc = StoreConnection.make(location) ;
DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ;
TDBLoader.load(dsg, filename) ;
dsg.commit() ;
TDB.sync(dsg) ;
dsg.close() ;
StoreConnection.release(location) ;
}
public static void main(String[] args) {
String path = "/home/castagna/Desktop/" ;
Location location = new Location(path + "tdb") ;
// 1
load(location, path + "1.ttl") ;
// 2
// dumpObjectFile (location) ;
// 3
// replay(location, path + "2.ttl") ;
// 4
// dumpObjectFile (location) ;
}
I first load the first file (i.e. 1.ttl).
Then I comment step 1 and uncomment step 2: dumpObjectFile.
I then comment step 2 and uncomment step 3 to load the second file (i.e. 2.ttl).
This time on an existing TDB location.
Comment step 3, uncomment step 4 to dump the object file out again. This time
the nodes.dat file is corrupted.
I tried to use Model read(...) method instead of TDBLoader load(...).
The effect is the same.
I tried to remove the TDB.sync(dsg), since I don't thing it is necessary there.
The effect is the same.
Am I missing something obvious here?
Paolo
Simon Helsen wrote:
Paolo,
In our tests, we are not using TDBLoader.load directly. But we do use
public Model add( Model m ) which in its turn calls
getBulkUpdateHandler().add( m.getGraph(), !suppressReifications );
Not sure if that helps in the analysis
Simon
From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 08:46 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly large
object
Hi,
I am currently investigating the issue.
So far, I managed to get an initial copy of TDB indexes which is not
corrupted
(~2.6GB). We then applied ~635 updates to it (and for each transaction I
have
the data which has been submitted). I then re-applied the changes with a
little
program which uses TxTDB only (via TDBLoader.load(...)). At the end of
this, the
nodes.dat file is corrupted.
This is just doing:
StoreConnection sc =
StoreConnection.make(location) ;
for ( int i = 1; i < 636; i++ ) {
System.out.println(i);
DatasetGraphTxn dsg =
sc.begin(ReadWrite.WRITE) ;
TDBLoader.load(dsg,
"/tmp/updates/" + i + ".ttl") ;
dsg.commit() ;
dsg.close() ;
}
I tried to apply same changes to an initially empty TDB database and there
are
no problems.
Now, I am double checking the integrity of my initial TDB indexes.
I then proceed applying one change at the time and verify integrity (via
dump).
Paolo
Simon Helsen wrote:
thanks Paolo,
this is related to jena-91. In fact, that is how our problems started
Glad someone else was able to reproduce
Simon
From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
09/28/2011 06:47 AM
Subject:
Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly
large
object
The object file of the node table (i.e. nodes.dat) is corrupted.
I tried to read it sequentially, I get:
(318670, java.nio.HeapByteBuffer[pos=0 lim=22 cap=22])
But, after that, the length of the next ByteBuffer is: 909129782 (*).
Paolo
(*) Running a simple program to iterate through all the Pair<Long,
ByteBuffer>
in the ObjectFile and debugging it: ObjectFileDiskDirect, line
176.
Paolo Castagna wrote:
Hi,
we are using|testing TxTDB.
In this case, we just perform a series of WRITE transactions
(sequentially
one after the other) and then issue a SPARQL query (as a READ
transaction).
There are no exceptions during the WRITE transactions.
This is the exception we see when we issue the SPARQL query:
com.hp.hpl.jena.tdb.base.file.FileException:
ObjectFile.read(9863)[119398665][119079969]: Impossibly large object :
1752462448 bytes
at
com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:282)
at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:164)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:88)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:59)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:89)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:60)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:56)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44)
at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:92)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:106)
at
com.hp.hpl.jena.sparql.core.ResultBinding._get(ResultBinding.java:44)
at
com.hp.hpl.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:20)
at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:35)
at
com.hp.hpl.jena.sparql.resultset.JSONOutput.format(JSONOutput.java:23)
at
com.hp.hpl.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:584)
[...]
This was with an Oracle JVM, 1.6.0_25 64-bit on an VM (on EC2) with
Ubuntu 64-bit OS. We are using a TxTDB packaged directly from SVN
(r1176416).
This seems to be a similar (or related) issue to:
https://issues.apache.org/jira/browse/JENA-91
Paolo