That is a pity. Good use cases with realistic (not necessarily real) data would be very helpful. Probably much more impact than small code fixes.
On Sun, Jan 13, 2013 at 5:54 PM, Florents Tselai <[email protected]>wrote: > For now, I'm afraid no, I don't. > > On Mon, Jan 14, 2013 at 3:31 AM, Ted Dunning <[email protected]> > wrote: > > > Do you have any sample data? > > > > On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <[email protected] > > >wrote: > > > > > Thanks for the reply! > > > > > > Yes, you're correct the data source is a smart-meter installed in each > > > building. > > > > > > On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <[email protected]> > > > wrote: > > > > > > > If you have discrete data, then I would think that simple > cooccurrence > > > > mining would be more useful than full on association mining. > > > > > > > > But is your data really a time-series? Are you extracting discrete > > > > features from the time series? > > > > > > > > In the following, I am assuming that when you say "real-time energy > > data" > > > > you actually mean something like smart meter consumption data for > > > > electricity. You could probably mean total energy emitted by a > > > particular > > > > set of three thousand quasars as well, but I assume the former is > more > > > > likely. Please correct me if you like. > > > > > > > > > > > > One very useful approach that I have seen with time series uses past > > data > > > > to predict the next sample (in the sense of regression). IF you have > > > such > > > > a regression model you can use Bayesian model clustering to find > > multiple > > > > patterns for regression. The output of this clustering is useful as > > the > > > > continuous equivalent of association mining. > > > > > > > > To be more concrete, suppose that you have several kinds of energy > > > > customers: > > > > > > > > - normal consumers who leave their house empty during the day, but > > have a > > > > substantial bump in energy consumption in the late afternoon or > evening > > > and > > > > then have a more spread pattern of usage on the weekend. > > > > > > > > - normal consumers who work a night shift > > > > > > > > - light offices which have peak usage during normal working hours > > > > > > > > - light industry with shift work that have relatively constant energy > > > usage > > > > > > > > If you build models for the energy consumption of these customers > > > > normalized to their previous week's total consumption and have the > > > > following features > > > > > > > > - time of day expressed as 4 sinusoids > > > > > > > > - day of week expressed as a 1 of 7 indicator > > > > > > > > - weekend expressed as a boolean > > > > > > > > I think that you will find that Bayesian model clustering will > recover > > > your > > > > original classes very nicely. > > > > > > > > > > > > > > > > > > > > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai < > [email protected] > > > > >wrote: > > > > > > > > > Real-time energy data, > > > > > Association mining is in fact the core analysis applied (but not > the > > > only > > > > > one for e.g. it could be classification as well). > > > > > > > > > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning < > [email protected]> > > > > > wrote: > > > > > > > > > > > Can you say more about what kind of data and what kind of > analysis? > > > > > > > > > > > > It is usually best if the work you do is motivated by your needs. > > > > > > > > > > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai > > > > > > <[email protected]>wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > In the next weeks/months I'll be using mahout for analyzing > some > > > big > > > > > data > > > > > > > for a start-up and I'd like my work there to be also reflected > > in > > > > > > mahout. > > > > > > > So I'd like to be a committer. I've already read all the > wiki's, > > > > > > guidlines > > > > > > > and have browsed through the jira issues. > > > > > > > > > > > > > > Firstly, I'de like to have a GOOD overview of the codebase and > > the > > > > > > overall > > > > > > > design. > > > > > > > So, my first thought was to start doing some refactorings > > > > (decomposing > > > > > > > methods and so on). > > > > > > > > > > > > > > Is there a specific place in the code that needs "cleaning"? > > > > > > > > > > > > > > > > > > > > > > > > > > > >
