I think I have a basic understanding of encoding. The integers in this case should share some bits if they are close, but categories should not (initially) share any bits because category 57 isn't necessarily related to category 58. If it turns out that category 97 is related to 23, then you'd get better predictions by having them share some bits. I guess you could use the Spatial Pooler to find these relationships, then change the encoder to reflect those relationships.
On Fri, Aug 8, 2014 at 9:39 AM, cogmission1 . <[email protected]> wrote: > >Each row is a data related to the a single display of an advertisement. > You're trying to predict whether the ad will be clicked >or not. > > Ryan, I'm also new here. What I've seen and gleaned from discussions > related to shaping the encoding of input revolves around the understanding > of how the input data expresses semantic meaning. The type of encoding > would be based on the dimensions of the data (how many variations can be > expected in input types), together with how many bits are needed given the > number of bits used to express the breadth of variation (window size of the > bits etc.). As long as the encoding can differentiate that, you can just > let the HTM "discover" the various relationships and distinctions in the > data - (i.e. it will just work). > > Following this you could work backwards once trained, and figure out what > it all means? > > David > > > On Fri, Aug 8, 2014 at 8:18 AM, Ryan Belcher <[email protected]> wrote: > >> I think the "urge to click" depends on the person browsing and the >> content of the ad. The words on the page act as a proxy for the person. >> If you search for "brake pads" then it's very likely you're a person >> looking for brake pads. But now the ad companies are collecting more and >> more information about people, so the words on the page aren't needed as >> much. They know you're interested in diapers even if the page has nothing >> to do with babies. >> >> None of that matters for the Criteo competition since they're not saying >> what any of the data means. The fields are named I1, I2, C1, etc. So all >> you can do is look for correlations in the data. >> >> >> On Fri, Aug 8, 2014 at 8:58 AM, David Ray <[email protected]> >> wrote: >> >>> That seems to be assuming that the "urge to click", is somehow related >>> to the pattern associated with the occurrence of words on a page? This >>> could be true and it would be interesting to find a correlation. >>> >>> You could maybe come up with a general theory for "click attraction" and >>> patterns associated with word occurrence and web browsing in general.... >>> >>> Sent from my iPhone >>> >>> On Aug 8, 2014, at 7:44 AM, Ryan Belcher <[email protected]> wrote: >>> >>> I'm looking at the Criteo Kaggle competition. Each row is a data >>> related to the a single display of an advertisement. You're trying to >>> predict whether the ad will be clicked or not. >>> >>> Am I trying to categorize? Yes and no. I'm trying to predict whether >>> the ad will be clicked, but the way I'm trying to do that is by >>> categorizing the rows into buckets and calculating probability based on the >>> category. >>> >>> I'm not sure how else you'd go about it. >>> >>> >>> On Thu, Aug 7, 2014 at 5:44 PM, Jim Bridgewater <[email protected]> >>> wrote: >>> >>>> Hi Ryan, >>>> >>>> For classification problems it sounds like you are headed in the right >>>> direction, but I'm unclear about what your objective is. Are you just >>>> trying to categorize each row in the data set? >>>> >>>> >>>> >>>> On Thu, Aug 7, 2014 at 1:33 PM, Ryan Belcher <[email protected]> wrote: >>>> > I've been playing around with NuPIC for a while and am still trying >>>> to wrap >>>> > my head around how to use it. Right now I'm playing with some >>>> prediction >>>> > scenarios where you have a number of input fields and you're trying to >>>> > predict one output. >>>> > >>>> > My understaning is that if the inputs aren't related temporally, then >>>> it's a >>>> > Spatial Pooling problem. If there are common patterns in the data, >>>> then it >>>> > may be helpful to create hierarchies of SPs. >>>> > >>>> > The data I'm looking at right now probably doesn't have common >>>> patterns. >>>> > It's basically a bunch of categorical data from which you're trying to >>>> > predict a boolean outcome. There are about 15M rows in the training >>>> set. >>>> > >>>> > So my thinking is to create 1 SP where the inputDimensions is wide >>>> enough to >>>> > accomodate all of the fields and columnDimensions sized so that rows >>>> get >>>> > grouped together. (If there were 100k columns, then on average 150 >>>> rows >>>> > would be pooled together.) >>>> > >>>> > In theory I could run all of the training data through the SP, then >>>> run it >>>> > through again (without learning) and calculate an outcome probability >>>> for >>>> > each column. Then I could run the test data through and it's >>>> probability >>>> > would be the probability of the column it matches. >>>> > >>>> > Is that a reasonable approach or am I way out in left field? >>>> > >>>> > Thanks, >>>> > Ryan >>>> > >>>> > _______________________________________________ >>>> > nupic mailing list >>>> > [email protected] >>>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> > >>>> >>>> >>>> >>>> -- >>>> James Bridgewater, PhD >>>> Arizona State University >>>> 480-227-9592 >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
