That is a pity.

Good use cases with realistic (not necessarily real) data would be very
helpful.  Probably much more impact than small code fixes.

On Sun, Jan 13, 2013 at 5:54 PM, Florents Tselai <[email protected]>wrote:

> For now, I'm afraid no, I don't.
>
> On Mon, Jan 14, 2013 at 3:31 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Do you have any sample data?
> >
> > On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <[email protected]
> > >wrote:
> >
> > > Thanks for the reply!
> > >
> > > Yes, you're correct the data source is a smart-meter installed in each
> > > building.
> > >
> > > On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <[email protected]>
> > > wrote:
> > >
> > > > If you have discrete data, then I would think that simple
> cooccurrence
> > > > mining would be more useful than full on association mining.
> > > >
> > > > But is your data really a time-series?  Are you extracting discrete
> > > > features from the time series?
> > > >
> > > > In the following, I am assuming that when you say "real-time energy
> > data"
> > > > you actually mean something like smart meter consumption data for
> > > > electricity.  You could probably mean total energy emitted by a
> > > particular
> > > > set of three thousand quasars as well, but I assume the former is
> more
> > > > likely.  Please correct me if you like.
> > > >
> > > >
> > > > One very useful approach that I have seen with time series uses past
> > data
> > > > to predict the next sample (in the sense of regression).  IF you have
> > > such
> > > > a regression model you can use Bayesian model clustering to find
> > multiple
> > > > patterns for regression.  The output of this clustering is useful as
> > the
> > > > continuous equivalent of association mining.
> > > >
> > > > To be more concrete, suppose that you have several kinds of energy
> > > > customers:
> > > >
> > > > - normal consumers who leave their house empty during the day, but
> > have a
> > > > substantial bump in energy consumption in the late afternoon or
> evening
> > > and
> > > > then have a more spread pattern of usage on the weekend.
> > > >
> > > > - normal consumers who work a night shift
> > > >
> > > > - light offices which have peak usage during normal working hours
> > > >
> > > > - light industry with shift work that have relatively constant energy
> > > usage
> > > >
> > > > If you build models for the energy consumption of these customers
> > > > normalized to their previous week's total consumption and have the
> > > > following features
> > > >
> > > > - time of day expressed as 4 sinusoids
> > > >
> > > > - day of week expressed as a 1 of 7 indicator
> > > >
> > > > - weekend expressed as a boolean
> > > >
> > > > I think that you will find that Bayesian model clustering will
> recover
> > > your
> > > > original classes very nicely.
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Real-time energy data,
> > > > > Association mining is in fact the core analysis applied (but not
> the
> > > only
> > > > > one for e.g. it could be classification as well).
> > > > >
> > > > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Can you say more about what kind of data and what kind of
> analysis?
> > > > > >
> > > > > > It is usually best if the work you do is motivated by your needs.
> > > > > >
> > > > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > > > > <[email protected]>wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > In the next weeks/months I'll be using mahout for analyzing
> some
> > > big
> > > > > data
> > > > > > >  for a start-up and I'd like my work there to be also reflected
> > in
> > > > > > mahout.
> > > > > > > So I'd like to be a committer. I've already read all the
> wiki's,
> > > > > > guidlines
> > > > > > > and have browsed through the jira issues.
> > > > > > >
> > > > > > > Firstly, I'de like to have a GOOD  overview of the codebase and
> > the
> > > > > > overall
> > > > > > > design.
> > > > > > > So, my first thought was to start doing some refactorings
> > > > (decomposing
> > > > > > > methods and so on).
> > > > > > >
> > > > > > > Is there a specific place in the code that needs "cleaning"?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to