Keith Wells created JENA-804:
--------------------------------
Summary: Jena is not reusing already allocated space on the file
system which results in large amounts of disk space reserved by Jena files
Key: JENA-804
URL: https://issues.apache.org/jira/browse/JENA-804
Project: Apache Jena
Issue Type: Bug
Components: Jena
Affects Versions: TDB 1.0.2
Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54
Reporter: Keith Wells
We have a product based on Jena TDB where we insert quads to Jena TDB along
with the deletion of quads. We understand the performance over space
architectural decision to not clean up deleted nodeids from the indexes. But
the usage of disk space appears that Jena TDB is not reusing allocated space
which had been allocated by Jena previously. Based on this comment there
appears to be something that is not correct on file space utilization,
http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%[email protected]%3E:
"The indexes won't shrink - TDB never gives disk space back to the OS - but
disk space is reused when reallocated within the same JVM.".
In this scenario on the same JVM with NO server stops or starts, we add 27765
graphs to IndexTdb and immediately remove them, repeating this process several
times.
MB Bytes Diff (Bytes)
Start 193 203239424
Reindex 5 249 262066176 58826752
Reindex 6 249 262086656 20480
Reindex 10 298 312500224 50413568
Reindex 11 298 312520704 20480
Reindex 12 298 312541184 20480
Reindex 13 298 312586240 45056
Reindex 14 306 320995328 8409088
Reindex 15 330 346181632 25186304
Reindex 16 330 346198538 16906
Reindex 17 346 362999808 16801270
Reindex 18 346 363020288 20480
Reindex 19 346 363040768 20480
Reindex 20 346 363061248 20480
Reindex 21 346 363081728 20480
Reindex 22 354 371490816 8409088
Reindex 23 378 396677120 25186304
End 193 203239424
The system starts with 193MB of data allocated by indexTdb. A reindex consists
of a remove followed by an add of these graphs. As you can see from the data
there is a dramatic increase in the size of indexTdb on the disk after
repeadedly removing and adding graphs. After Reindex 23, there is 378 MB of
disk space used. If Jena TDB reused allocated space there would be no need to
allocate more space other than what is used by deleted node ids (unless nodeid
storage is eating all of this space?). Jena does not appear to be reusing the
allocated disk space. At the very end of this scenario, we exported the nquads
and reloaded them to show the original disk space was 193MB back to where it
started.
We believe Jena TDB is not reusing the space allocated by the TDB file system
within the same JVM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)