On Thu, Aug 15, 2013 at 8:47 AM, Andy Seaborne <a...@apache.org> wrote: > > On 15/08/13 10:21, Knut-Olav Hoven wrote: >> >> Hi! > > > Hi there - thanks for the detailed report. > > >> >> Two issues, related to memory usage. Import and delete of large graphs. >> >> I am currently doing some tests with 128MB heap with a little over 1M >> tuples. >> I know I can throw a lot of memory onto the problem, but sooner or later I >> will run out. > > > There are some fixed size caches (as you've discovered) - 128M is likely to > be to small for them. > > >> I've noticed that TDB takes the complete resultset into memory when calling >> "DatasetGraphTDB.deleteAny" before looping over all of them to delete them. >> This makes a problem for very large graphs if I try to delete the entire >> graph or a large selection. > > > There is supposed to be a specific implement for deleteAny which is like > GraphTDB.removeWorker. But there isn't. Actually, I don't see why > GraphTDB.removeWorker needs to exist if a proper DatasetGraphTDB.deleteAny > existed. > > Recorded as JENA-513. > > I'll sort this out by moving the GraphTDB.removeWorker to DatasetGraphTDB and > use for deleteAny(...) and from GraphTDB.remove. > > The GraphTDB.removeWorker code gets batches of 1000 items, deletes them and > tries again until there is nothing more matching the delete pattern. Deletes > are not done by iterator. >
So as an alternative, you can use SPARQL Update combined with setting the ARQ.spillToDiskThreshold parameter to a reasonable value (10,000 maybe?). This will enable stream-to-disk functionality for the intermediate bindings for DELETE/INSERT/WHERE queries (as well as several of the SPARQL operators in the WHERE clause, see JENA-119). This should eliminate memory bounds for the most part except for the TDB's BlockMgrJournal. > That said, having the code for iterator remove for RecordRangeIterator and in > TupleTable would be excellent regardless of this. When I went looking for > BTree code originally, I found various possibilities but all too closely tied > to their usage to be reusable. We could pull out the B+Tree code into a > reusable module. > > There are some RecordRangeIterator iterator cases that will not work with > Iterator.delete ... for example, when the B+Tree is not on the same machine > as the TupleIndex client. > > >> I figured out a way to make the iterators backed by indexes/nodes and can >> now delete each directly from the iterator. Just hope I have covered all >> cases by implementing remove() in RecordRangeIterator and in TupleTable >> (connected to all indexes). This was the "easy" part. >> >> The difficult part is the Transaction and Journal which doesn't write to >> the journal before the transaction is just about to be committed. This >> means that there becomes many Block objects kept in memory in the HashMap >> "BlockMgrJournal.writeBlocks". > > > Yes - this is a limitation of the current transaction system. The blocks may > still be accessed so they can't be written to the journal and forgotten. > There could be a cache that knows where the block is in the journal and > fetches it back (minor but them the journal is jumbled and if in numerical > block order, the writes for flushing back to the disk are likely more > efficient). > > My very long term approach would be to use immutable B+Trees where the blocks > tree to the root are copied when a block first changes. This means that > transactional data is written once, during the write transaction. Commit > means switch to the new root for all subsequent transactions. Old trees > remain. The hard part is that tree needs to garbage collected. Typically, > this is done by a background task writing a new copy. c.f. CouchDB, BDB-JE > (?) and Mulgara (not B+Trees but same approach) amongst others. > > This is a not insignificant rewrite of the B+Tree ad BlockMgr code. > > If there were a spill cache for BlockMgrJournal that would be a great thing > to have. It's a much more direct way to get scalable transactions and works > without a DB format change. > Agreed. Unfortunately the *DataBag classes require all data to be written before any reading occurs, which makes them inappropriate. Can't we just use another disk-backed B+Tree as a temporary store here instead of the in-memory HashMap? I've actually been running into this issue because now that streaming SPARQL Update support is available, I find I am generating and streaming so much data in a single transaction that I need to devote a not-insignificant amount of heap just for storing the pending blocks. > >> Trying to fix this by just writing to the journal directly results in >> another issue in all those unit tests that open multiple transactions. The >> problem is that the journal is not replayed onto the database files if >> there are any transactions open. The reason for why BlockMgrJournal works >> in those tests are that the writesBlocks HashMap are never cleared after >> transaction (and the other transactions hit that one instead of the backing >> files). >> >> I also encountered a case during import that led to a corrupt database that >> I could not recorver. Always got an exception from "ObjectFileStorage.read" >> telling me that I had an "Impossibly large object". >> >> Those cases always started with an OutOfMemoryError during import while >> writing to the database files. By lowering the caches Node2NodeIdCacheSize >> and NodeId2NodeCacheSize and splitting import files into smaller >> batches/transactions it went fine. It seems to recover by just returning an >> empty ByteBuffer instead of throwing the exception, but it would just cover >> up a bad state I guess. Maybe there might be some optimization that can be >> done to the part where the journal is spooled onto the database files to >> avoid the OutOfMemoryError issue all together to avoid corrupt databases. > > > Sorry - if "Impossibly large object" happens the database is unrecoverable. > The problem happened at write time - it's just detected at read time. > > >> Should I open some issues in Jira? > > > Please do. > > >> I can provide some patches for the iterators remove() functions. > > > Awesome. > > >> >> >> Sincerely, >> >> Knut-Olav Hoven >> NRK, Norwegian Broadcaster Corporation >> > > Andy > >