[ 
https://issues.apache.org/jira/browse/OMID-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039475#comment-16039475
 ] 

Francisco Perez-Sorrosal commented on OMID-66:
----------------------------------------------

[~ruchirc]
1. Regarding the frequency of cleaning:
We can define a configuration option for the coprocessor with the corresponding 
key, similar to the one used in the current coprocessor for getting the commit 
table name:
https://github.com/apache/incubator-omid/blob/master/hbase-coprocessor/src/main/java/org/apache/omid/transaction/OmidCompactor.java#L82
e.g.  private static final String HBASE_CT_CLEANER_FREQUENCY_IN_MINUTES_KEY = 
"omid.hbase.commitTableCleaner.cleaningFrequencyInMinutes";

2. Regarding the utilization of the timestamp as identifier of the commit table 
entries older than the low watermark:
In this code 
https://git.corp.yahoo.com/fperez/omid/blob/master/hbase-commit-table/src/main/java/org/apache/omid/committable/hbase/HBaseCommitTable.java#L101
you can see that we use the start timestamp as HBase timestamp, so yes, the 
entries in the commit table contain the same info as in the row key and it can 
be used as identifier

3. Regarding the statistics:
I'm not sure what is the best method to report statistics in HBase 
coprocessors. Long time ago I submitted this PR 
(https://github.com/apache/incubator-omid/pull/7) for adding statistics to the 
current compactor coprocessor but nobody has reviewed yet. Do you know the 
approach used in the PR is standard in the HBase community? If yes, we can 
follow the same approach as in the PR I submitted. Otherwise, for now you can 
add just a log line informing about the number of entries scanned vs removed in 
each coprocessor execution.



> Clean commit table entries lower than the LWM
> ---------------------------------------------
>
>                 Key: OMID-66
>                 URL: https://issues.apache.org/jira/browse/OMID-66
>             Project: Apache Omid
>          Issue Type: Improvement
>            Reporter: Francisco Perez-Sorrosal
>            Assignee: ruchir choudhry
>            Priority: Minor
>
> The entries of the committed transactions in the commit table that clients 
> were not able remove (e.g. because they failed) stay there as trash. There 
> should be a mechanism (e.g. in form of a coprocessor) to remove those entries.
> The cleaning mechanism should run periodically, reading the current low 
> watermark and removing the commit table entries older than the read low 
> watermark. Statistics such as the number of cleaned entries would be nice to 
> have.
> Up to now we haven't seen a degraded performance in production systems 
> because of this fact, but it could be a potential problem at some point in 
> the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to