[ https://issues.apache.org/jira/browse/JENA-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453380#comment-17453380 ]
Andy Seaborne commented on JENA-2204: ------------------------------------- I can not reproduce your figures. What is you JVM? What sort of disk setup do you have? What data is in the database you show above because that's 2Mbytes? And it looks like a 32 bit JVM because the SPO index file size is 8192 bytes, a single block, not 8388608 (which is sparse and only real 8k). 50K BSBM is 10Mbytes using TDB2 (any loader), 21M when loaded into a named graph. Including unallocated space, it is 200M. As noted above, TDB pre-allocates space with sparse memory mapped files. They do not consume real disk space until needed. The space is used before the database grows again. If you have been adding and deleting data, not loading in one go, things are different. Hence compaction. If you copy the database, at least on linux, it is still sparse files. If you zip the database and unzip, likely the files are full size. compaction restores a smaller database. If you compact a database, the earlier versions are still there until you delete them. Look in the directory - there are multiple "Data-NNNN" subdirectories. The newest/highest-number is the current database, the rest are old copies at that point in time. You can delete the rest (or zip them first and used them as backups, or keep as historical record, or ...). Changing the blocksize will not change the actual space used significantly. The way to change the block size is using {{StoreParams}} before creating the database. It is {{SegmentSize}} (in {{SystemIndex}} not TDB2) that sizes the sparse file size but you seem to be running 32bit which does not use mmap files. (If you set the segment size small, you limit the size of the database because your OS will run out of mmap segments.) > Storage required by TDB2 is much higher than TDB1, How to Fix ? > --------------------------------------------------------------- > > Key: JENA-2204 > URL: https://issues.apache.org/jira/browse/JENA-2204 > Project: Apache Jena > Issue Type: Question > Reporter: Hemant Tiwari > Priority: Minor > > The storage required by TDB2 is much higher than TDB1 > For 100k statements - TDB1 takes about 90 MB, while TDB2 is taking ~ close to > 1 GB. > Why is there such a difference and is there any solution available to reduce > the storage size? -- This message was sent by Atlassian Jira (v8.20.1#820001)