Re: Where to start refactoring?

Ted Dunning Sun, 13 Jan 2013 17:08:53 -0800

If you have discrete data, then I would think that simple cooccurrence
mining would be more useful than full on association mining.

But is your data really a time-series?  Are you extracting discrete
features from the time series?

In the following, I am assuming that when you say "real-time energy data"
you actually mean something like smart meter consumption data for
electricity.  You could probably mean total energy emitted by a particular
set of three thousand quasars as well, but I assume the former is more
likely.  Please correct me if you like.

One very useful approach that I have seen with time series uses past data
to predict the next sample (in the sense of regression).  IF you have such
a regression model you can use Bayesian model clustering to find multiple
patterns for regression.  The output of this clustering is useful as the
continuous equivalent of association mining.

To be more concrete, suppose that you have several kinds of energy
customers:

- normal consumers who leave their house empty during the day, but have a
substantial bump in energy consumption in the late afternoon or evening and
then have a more spread pattern of usage on the weekend.

- normal consumers who work a night shift

- light offices which have peak usage during normal working hours

- light industry with shift work that have relatively constant energy usage

If you build models for the energy consumption of these customers
normalized to their previous week's total consumption and have the
following features

- time of day expressed as 4 sinusoids

- day of week expressed as a 1 of 7 indicator

- weekend expressed as a boolean

I think that you will find that Bayesian model clustering will recover your
original classes very nicely.

On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <[email protected]>wrote:

> Real-time energy data,
> Association mining is in fact the core analysis applied (but not the only
> one for e.g. it could be classification as well).
>
> On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Can you say more about what kind of data and what kind of analysis?
> >
> > It is usually best if the work you do is motivated by your needs.
> >
> > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > <[email protected]>wrote:
> >
> > > Hello,
> > >
> > > In the next weeks/months I'll be using mahout for analyzing some big
> data
> > >  for a start-up and I'd like my work there to be also reflected in
> > mahout.
> > > So I'd like to be a committer. I've already read all the wiki's,
> > guidlines
> > > and have browsed through the jira issues.
> > >
> > > Firstly, I'de like to have a GOOD  overview of the codebase and the
> > overall
> > > design.
> > > So, my first thought was to start doing some refactorings (decomposing
> > > methods and so on).
> > >
> > > Is there a specific place in the code that needs "cleaning"?
> > >
> >
>

Re: Where to start refactoring?

Reply via email to