[ 
https://issues.apache.org/jira/browse/JENA-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453380#comment-17453380
 ] 

Andy Seaborne commented on JENA-2204:
-------------------------------------

I can not reproduce your figures.

What is you JVM?
What sort of disk setup do you have?

What data is in the database you show above because that's 2Mbytes? 
And it looks like a 32 bit JVM because the SPO index file size is 8192 bytes, a 
single block, not 8388608 (which is sparse and only real 8k). 

50K BSBM is 10Mbytes using TDB2 (any loader), 21M when loaded into a named 
graph.

Including unallocated space, it is 200M. As noted above, TDB pre-allocates 
space with sparse memory mapped files. They do not consume real disk space 
until needed. The space is used before the database grows again.

If you have been adding and deleting data, not loading in one go, things are 
different. Hence compaction.

If you copy the database, at least on linux, it is still sparse files.
If you zip the database and unzip, likely the files are full size. compaction 
restores a smaller database.

If you compact a database, the earlier versions are still there until you 
delete them.

Look in the directory - there are multiple "Data-NNNN" subdirectories. The 
newest/highest-number is the current database, the rest are old copies at that 
point in time. You can delete the rest (or zip them first and used them as 
backups, or keep as historical record, or ...).

Changing the blocksize will not change the actual space used significantly. The 
way to change the block size is using {{StoreParams}} before creating the 
database.

It is {{SegmentSize}} (in {{SystemIndex}} not TDB2) that sizes the sparse file 
size but you seem to be running 32bit which does not use mmap files.
(If you set the segment size small, you limit the size of the database because 
your OS will run out of mmap segments.)



> Storage required by TDB2 is much higher than TDB1, How to Fix ?
> ---------------------------------------------------------------
>
>                 Key: JENA-2204
>                 URL: https://issues.apache.org/jira/browse/JENA-2204
>             Project: Apache Jena
>          Issue Type: Question
>            Reporter: Hemant Tiwari
>            Priority: Minor
>
> The storage required by TDB2 is much higher than TDB1
> For 100k statements - TDB1 takes about 90 MB, while TDB2 is taking ~ close to 
> 1 GB.
> Why is there such a difference and is there any solution available to reduce 
> the storage size?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to