Github user anew commented on a diff in the pull request:
https://github.com/apache/incubator-tephra/pull/20#discussion_r90759195
--- Diff:
tephra-core/src/main/java/org/apache/tephra/janitor/TransactionPruningPlugin.java
---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ *
+ * <p/>
+ * An invalid transaction can only be removed from the invalid list after
the data written
+ * by the invalid transactions has been removed from all the data stores.
+ * The term data store is used here to represent a set of tables in a
database that have
+ * the same data clean up policy, like all Apache Phoenix tables in an
HBase instance.
+ *
+ * <p/>
+ * Typically every data store will have a background job which cleans up
the data written by invalid transactions.
+ * Prune upper bound for a data store is defined as the largest invalid
transaction whose data has been
+ * cleaned up from that data store.
+ * <pre>
+ * prune-upper-bound = min(max(invalid list), min(in-progress list) - 1)
+ * </pre>
+ * where invalid list and in-progress list are from the transaction
snapshot used to clean up the invalid data in the
+ * data store.
+ *
+ * <p/>
+ * There will be one such plugin per data store. The plugins will be
executed as part of the Transaction Service.
+ * Each plugin will be invoked periodically to fetch the prune upper bound
for its data store.
+ * Invalid transaction list can pruned up to the minimum of prune upper
bounds returned by all the plugins.
+ */
+public interface TransactionPruningPlugin {
+ /**
+ * Called once when the Transaction Service starts up.
+ *
+ * @param conf configuration for the plugin
+ */
+ void initialize(Configuration conf) throws IOException;
+
+ /**
+ * Called periodically to fetch prune upper bound for a data store. The
plugin examines the state of data cleanup
+ * in the data store and determines the smallest invalid transaction
whose writes no longer exist in the data
+ * store. It then returns this smallest invalid transaction as the prune
upper bound for this data store.
+ *
+ * @param time start time of this prune iteration in milliseconds
+ * @param pruneUpperBoundForTime the largest invalid transaction that
can be possibly removed
+ * from the invalid list for the given
time.
+ * In terms of HBase, this is the smallest
not in-progress transaction that will
+ * not have writes in any HBase regions
that are created after the given time.
+ * The plugin will typically return a
reduced upper bound based on the state of
+ * the invalid transaction data clean up
in the data store.
--- End diff --
I still don't understand what this is. I though this is an upper bound
determined by the tx manager, based on its knowlegde of what invalid
transactions may still have active processes and therefore future writes?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---