I wanted to call this "colapse", because it seemed like the perfect term, but that seems to be a widely-used word. Google finds 5560 hits, both misspellings and other things.
"oerlap" isn't quite as apt a term, but it isn't used for anything else, except as a rare misspelling of "overlap". This is sort of OLAPish (see http://www.olapreport.com/fasmi.htm) but not full OLAP. (The FASMI criteria are "fast, analysis, shared, multidimensional, information"; "fast" means "simplest analyses under one second, most responses within five seconds, very few more than 20 seconds"; "analysis" means end-users can program it to do business logic, statistical analysis, and other ad hoc calculations; "shared" means it supports reasonable security with shared read-write access; "multidimensional" means it must provide a multidimensional conceptual view of the data with hierarchies and multiple hierarchies; and "information" means it handles lots of information. oerlap is a half-assed hack at all of these.) It's often the case that I have a bunch (tens or hundreds of thousands of rows) of tabular data that I want to explore interactively, and I don't have a good way to do that. I envision "oerlap": a simple UI that makes this easy. You feed it tabular data; it presents you with a table. Initially, the table has one row, with one cell for each field in the input data. Each cell contains a list of the most frequent three values in that field, with their respective numbers of occurrences. There is an extra cell that indicates the number of input rows. Clicking on a cell causes the table to expand until it has one row for each value of that field; it is sorted by the number of occurrences of those values, so that the first few rows are the ones that represent most of the input data records. The extra cell indicating the number of input records is still there, but now it's an entire column, indicating the number of input records represented by each rows. The remaining un-broken-out columns are displayed as before: each cell contains the most frequent three values for that field, with their respective numbers of occurrences. So each column is in one of two states, broken-out or summary; there is one row in the displayed table for each distinct tuple of values from the broken-out columns. Clicking on a value in a column switches it between broken-out and summary state. Clicking on a column header causes the table to be sorted by the values in that column; by default, it's sorted by the extra column indicating number of input rows. In its current state, it only does the analysis; it doesn't provide the sorting, HTML interface, and interactivity I envision. Maybe soon. # incredibly powerful secret web log analysis tool import string def oerlap(datasrc, breakoutby): """Analyze data. Given a data source that yields tuples or None when .next() is called, and a sequence 'breakoutby' that specifies which fields of the tuples to break out by, count frequencies. """ results = {} while 1: line = datasrc.next() if line is None: return results key = tuple(map(lambda f, line=line: line[f], breakoutby)) r = results.setdefault(key, map(lambda x: {}, range(len(line)))) if len(r) < len(line): r.extend([{}] * (len(line) - len(r))) for dict, value in map(None, r, line): dict[value] = dict.get(value, 0) + 1 class filelines: "Return lines from a file." def __init__(self, somefile): self.file = somefile def next(self): line = self.file.readline() if line == "": return None return tuple(map(lambda x: intern(x), string.split(line))) class arrayitems: "For testing. Return tuples from an array." def __init__(self, somearray): self.array = somearray self.ii = 0 def next(self): if self.ii == len(self.array): return None try: return self.array[self.ii] finally: self.ii = self.ii + 1 testdata = [('a', 1, 32), ('a', 1, 33), ('b', 1, 31), ('c', 2, 30), ('a', 0, 30)] def test(bb=[]): return oerlap(arrayitems(testdata), bb)