[ 
https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628525#comment-14628525
 ] 

Trevor Donaldson edited comment on JENA-804 at 7/15/15 7:37 PM:
----------------------------------------------------------------

I agree with Keith. My team and my customer is unable to use Jena because of 
this issue. We often delete a graph and replace the data inside of a graph. 
Doing this a few times caused us to reach critical mass in our storage very 
quickly. 


was (Author: tmdonalds):
I agree with Keith. We are unable to use Jena because of this issue. We often 
delete a graph and replace the data inside of a graph. Doing this a few times 
caused us to reach critical mass in our storage very quickly. 

> Jena is not reusing already allocated space on the file system which results 
> in large amounts of disk space reserved by Jena files
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-804
>                 URL: https://issues.apache.org/jira/browse/JENA-804
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Jena
>    Affects Versions: Jena 2.11.2, TDB 1.0.2
>         Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54
>            Reporter: Keith Wells
>         Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh
>
>
> We have a product based on Jena TDB where we insert quads to Jena TDB along 
> with the deletion of quads.  We understand the performance over space 
> architectural decision to not clean up deleted nodeids from the indexes. But 
> the usage of disk space appears that Jena TDB is not reusing allocated space 
> which had been allocated by Jena previously.  Based on this comment there 
> appears to be something that is not correct on file space utilization, 
> http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%[email protected]%3E:
>  "The indexes won't shrink - TDB never gives disk space back to the OS -  but 
> disk space is reused when reallocated within the same JVM.".
> In this scenario on the same JVM with NO server stops or starts, we add 27765 
> graphs to IndexTdb and immediately remove them,  repeating this process 
> several times. 
> {noformat}
>                  MB   Bytes           Diff (Bytes)
> Start           193   203239424               
>                               
> Reindex 5     249     262066176               58826752
> Reindex 6     249     262086656               20480
> Reindex 10    298     312500224               50413568
> Reindex 11    298     312520704               20480
> Reindex 12    298     312541184               20480
> Reindex 13    298     312586240               45056
> Reindex 14    306     320995328               8409088
> Reindex 15    330     346181632               25186304
> Reindex 16    330     346198538               16906
> Reindex 17    346     362999808               16801270
> Reindex 18    346     363020288               20480
> Reindex 19    346     363040768               20480
> Reindex 20    346     363061248               20480
> Reindex 21    346     363081728               20480
> Reindex 22    354     371490816               8409088
> Reindex 23    378     396677120               25186304
>                               
> End   193     203239424               
> {noformat}
> The system starts with 193MB of data allocated by indexTdb.  A reindex 
> consists of a remove followed by an add of these graphs. As you can see from 
> the data there is a dramatic increase in the size of indexTdb on the disk 
> after repeadedly removing and adding graphs.  After Reindex 23, there is 378 
> MB of disk space used.  If Jena TDB reused allocated space there would be no 
> need to allocate more space other than what is used by deleted node ids 
> (unless nodeid storage is eating all of this space?).  Jena does not appear 
> to be reusing the allocated disk space.  At the very end of this scenario, we 
> exported the nquads and reloaded them to show the original disk space was 
> 193MB back to where it started. 
> We believe Jena TDB is not reusing the space allocated by the TDB file system 
> within the same JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to