[ 
https://issues.apache.org/jira/browse/JENA-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780131#comment-13780131
 ] 

Andy Seaborne commented on JENA-550:
------------------------------------

Yes.  If you break bulk load, the state of the database is undefined.  The bulk 
loaders are written for speed and that means less checking and not being inside 
a transaction.  (Aside - I've added a test for tdbloader2 to require an empty 
directory.)  The other one, tdbloader which can incrementally load, has to 
assume any existing database is valid.

                
> "Impossibly Large Object" exception with command-line indexing
> --------------------------------------------------------------
>
>                 Key: JENA-550
>                 URL: https://issues.apache.org/jira/browse/JENA-550
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>            Reporter: Leigh Dodds
>            Priority: Minor
>
> I have a script that calls tdbloader2 to create TDB indexes then the new 
> Lucene text indexer to create indexes.
> The first step completed successfully and then whilst the text indexer was 
> running I got the following stack trace:
> ERROR 
> ObjectFileStorage.read[nodes](21694280)[filesize=32753969][file.size()=32753969]:
>  Impossibly large object : 1668246831 bytes > 
> filesize-(loc+SizeOfInt)=11059685
> com.hp.hpl.jena.tdb.base.file.FileException: 
> ObjectFileStorage.read[nodes](21694280)[filesize=32753969][file.size()=32753969]:
>  Impossibly large object : 1668246831 bytes > 
> filesize-(loc+SizeOfInt)=11059685
>       at 
> com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:346)
>       at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:178)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:74)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:103)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:74)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
>       at 
> com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>       at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:161)
>       at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:153)
>       at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:45)
>       at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:87)
>       at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:83)
>       at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:317)
>       at 
> org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:97)
>       at jena.textindexer.exec(textindexer.java:125)
>       at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
>       at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
>       at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
>       at jena.textindexer.main(textindexer.java:55)
> No other code is touching the database, so I'm not clear how the node table 
> could have gotten corrupted. 
> During a previous run of the script I got an exception because of an invalid 
> URI:
> org.apache.jena.riot.RiotException: [line: 2, col: 110] illegal escape 
> sequence value: , (0x2C)
>       at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:132)
> I'm wondering whether this exception might have been the cause of the 
> corruption?
> Deleting the index directories and re-running fixed the issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to