[
https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628525#comment-14628525
]
Trevor Donaldson edited comment on JENA-804 at 7/15/15 7:37 PM:
----------------------------------------------------------------
I agree with Keith. My team and my customer is unable to use Jena because of
this issue. We often delete a graph and replace the data inside of a graph.
Doing this a few times caused us to reach critical mass in our storage very
quickly.
was (Author: tmdonalds):
I agree with Keith. We are unable to use Jena because of this issue. We often
delete a graph and replace the data inside of a graph. Doing this a few times
caused us to reach critical mass in our storage very quickly.
> Jena is not reusing already allocated space on the file system which results
> in large amounts of disk space reserved by Jena files
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: JENA-804
> URL: https://issues.apache.org/jira/browse/JENA-804
> Project: Apache Jena
> Issue Type: Bug
> Components: Jena
> Affects Versions: Jena 2.11.2, TDB 1.0.2
> Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54
> Reporter: Keith Wells
> Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh
>
>
> We have a product based on Jena TDB where we insert quads to Jena TDB along
> with the deletion of quads. We understand the performance over space
> architectural decision to not clean up deleted nodeids from the indexes. But
> the usage of disk space appears that Jena TDB is not reusing allocated space
> which had been allocated by Jena previously. Based on this comment there
> appears to be something that is not correct on file space utilization,
> http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%[email protected]%3E:
> "The indexes won't shrink - TDB never gives disk space back to the OS - but
> disk space is reused when reallocated within the same JVM.".
> In this scenario on the same JVM with NO server stops or starts, we add 27765
> graphs to IndexTdb and immediately remove them, repeating this process
> several times.
> {noformat}
> MB Bytes Diff (Bytes)
> Start 193 203239424
>
> Reindex 5 249 262066176 58826752
> Reindex 6 249 262086656 20480
> Reindex 10 298 312500224 50413568
> Reindex 11 298 312520704 20480
> Reindex 12 298 312541184 20480
> Reindex 13 298 312586240 45056
> Reindex 14 306 320995328 8409088
> Reindex 15 330 346181632 25186304
> Reindex 16 330 346198538 16906
> Reindex 17 346 362999808 16801270
> Reindex 18 346 363020288 20480
> Reindex 19 346 363040768 20480
> Reindex 20 346 363061248 20480
> Reindex 21 346 363081728 20480
> Reindex 22 354 371490816 8409088
> Reindex 23 378 396677120 25186304
>
> End 193 203239424
> {noformat}
> The system starts with 193MB of data allocated by indexTdb. A reindex
> consists of a remove followed by an add of these graphs. As you can see from
> the data there is a dramatic increase in the size of indexTdb on the disk
> after repeadedly removing and adding graphs. After Reindex 23, there is 378
> MB of disk space used. If Jena TDB reused allocated space there would be no
> need to allocate more space other than what is used by deleted node ids
> (unless nodeid storage is eating all of this space?). Jena does not appear
> to be reusing the allocated disk space. At the very end of this scenario, we
> exported the nquads and reloaded them to show the original disk space was
> 193MB back to where it started.
> We believe Jena TDB is not reusing the space allocated by the TDB file system
> within the same JVM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)