[ 
https://issues.apache.org/jira/browse/JENA-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463125#comment-13463125
 ] 

Simon Helsen commented on JENA-327:
-----------------------------------

"what counts as 'extremely large stores' in triple/quad count?" 

50million+, i.e. many gigabytes of disk space

"A READ action does not depend on internal characteristics so it is the most 
stable option"

ok, so let me make sure I get this right: you are saying that the best way to 
perform online backups is to perform a quad dump on the dataset inside a READ 
transaction and that this is guaranteed to remain safe over time? 

I am not following "There is no need to flush the journal - just back it up 
like everything else", Is this separately required when backing up from the 
regular dataset in the READ transaction? Or was this comment referring to a 
separate option. 


                
> TDB Tx transaction lock to permit backups
> -----------------------------------------
>
>                 Key: JENA-327
>                 URL: https://issues.apache.org/jira/browse/JENA-327
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>    Affects Versions: TDB 0.9.4
>            Reporter: Simon Helsen
>
> With large repositories, it is important to be able to create backups once in 
> a while. This is because recreating an rdf store with millions of triples can 
> be forbiddingly expensive. Moreover, it should be possible to take those 
> backups while still allowing read activity on the store as in many cases, a 
> complete shutdown is usually not possible. Before the introduction of tx, it 
> was relatively straightforward to provide the right locks on the client-side 
> to safely suspend any disk activity for a period of time enough to make a 
> backup of the index. 
> However, since tx, things have become slightly more complicated because TDB 
> Tx touches the disk at other times than when performing write/sync 
> activities. Right now, because of some understanding of how TDB Tx is 
> implemented, it is still possible for clients to avoid disk activities to 
> implement a backup process, but this dependency on TDB Tx implementation 
> details is not very good. Moreover, we anticipate that in the future, the 
> merging process from the journal into the main index may become entirely 
> asynchornous for performance reasons. The moment that happens, client have no 
> control anymore as to when the disk is being touched.
> For this reason, we are requesting the following feature: a "backup" lock (by 
> lack of a better name). Its semantics is that when the lock is taken, TDB Tx 
> guarantees that no disk activity takes place and if necessary pauses 
> activities. In other words, no write transaction should be able to complete 
> and read transactions will not attempt to merge the journal. The idea would 
> be that regular read activities can still continue. The API could be as 
> simple as something like this:
> try {
> dataset.begin(ReadWrite.BACKUP) ;
> <do whatever is necessary to backup the index>
> } finally {
> dataset.end()
> }
> As for the implementation, we suspect you currently have locks in place which 
> could be used to guarantee this behavior. E.g. could 
> txn.getBaseDataset().getLock().enterCriticalSection(Lock.WRITE) be sufficient?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to