Hi all, Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of the set in multiple MR job iterations i.e. multiple MR jobs would be run one after another several times. In each iteration, a subset of the values would be computed i.e. the value of the set would be computed incrementally. I am using HBase to store the result. In this scenario, my design is as follows
Schema Design: - S(X) is the row key. - Each element would be a column in the column family. The label of the column would be the iteration number followed by a number indicating the position of the element in the subset. Eg: In iteration 1, subset {a,b} has been computed. Then the row would be S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of the column family. I can add the results of subsequent iterations (other subsets) to S(X) by adding more columns. Would this design be appropriate for the above scenario? There would be many S(X) - X can be X1, X2, X3, .... and many elements in the set, S(X). Filtering: To retrieve all the sets, S(X), a range fetch should be performed. I wouldn't know the startkey and endkey because number of S(X) sets is not known before hand. Can I use PrefixFilter for this, by setting prefix as 'S'? Thank you in advance. Regards, Raghava.