Got a couple questions about the pig-based aggregation. These may slightly duplicate JIRA comments, so apologies and no need to answer more than once.
1) Can we run the aggregation scripts in local mode? I haven't been able to get Pig to read from anything other than file:/// in local mode. Is there a trick to it? 2) Is there a good way to sanity check my tables and make sure the data in HBase looks right? Not quite sure what they "should" look like. 3) What's the default epoch to start aggregating from? What happens if I don't specify START= to the command? 4) Is there a good way to find out what the last epoch it started summarizing from was? Is there a big cost to being over-inclusive? --Ari -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
