Sure, I can look that up. I need to dig around for it. --Subutai
On Tue, Apr 8, 2014 at 12:52 PM, Marek Otahal <[email protected]> wrote: > Subutai, > do you think you'd still dig up some paper or data from your prev. > experiments? Would be interesting! > Cheers, > > > On Tue, Apr 8, 2014 at 5:35 PM, Subutai Ahmad <[email protected]> wrote: > >> Hi Julie, >> >> Just to add to everyone else's input, this is a great application area >> for CLA's. I did some similar work a couple of years ago and got pretty >> good results. >> >> In terms of encoders, the simplest is to just use the OPF and use the >> "string" field type instead of float. Every new string that is encountered >> will automatically get a new random representation. With this scheme each >> new string will be treated as a completely unique token with no semantic >> similarity to other URL's. You'll want to make sure the string doesn't >> contain extraneous stuff since any difference will lead to a new >> representation. >> >> You could break each URL into multiple fields as you suggested. Just make >> each one a separate CSV field and each field into a string type. I think >> this will achieve an effect that is similar to Chetan's suggestion. In my >> experiment each URL represented a news article and had a natural "topic" >> associated with it such as "business" or "politics" so I had a "topic" >> field. >> >> For best results I would recommend starting with a smaller dataset with a >> relatively small number of unique strings and then work your way up from >> there. The amount of data you need to get good results will grow fast as >> the number of unique strings increases. You'll probably want to swarm on >> the dataset as the parameters may need to be quite different from the >> default hotgym parameters. >> >> I'm curious to see how this goes. Please send along your results and >> questions as you make progress! >> >> --Subutai >> >> >> On Thu, Apr 3, 2014 at 4:43 PM, Julie Pitt <[email protected]> wrote: >> >>> I am tinkering with the CLA a bit and want to play around with web >>> browsing history data. >>> >>> I'm trying to determine whether it would be feasible to predict the URL, >>> or at least the top-level domain that is most likely to be visited next by >>> a web surfer, based on their past browsing history. I might go so far as to >>> make a multi-step prediction to short-circuit the navigation of a web >>> surfer to directly the page they are interested in. >>> >>> First of all, I'm looking for feedback on whether this idea even makes >>> sense as an application of the CLA, and whether anyone has tried something >>> similar. >>> >>> Second, I'm a little bit stuck coming up with a good way to encode a URL >>> for input to the SP. One thought is to break the URL into component fields >>> (e.g., top-level domain, URL path and params). The problem is that the >>> encoding should be adaptive and pick up values that have never been seen >>> before. I'm uncertain how to approach this. >>> >>> Since there's no semantic similarity to be inferred between two >>> different TLDs with similar names, a basic numeric encoding doesn't make >>> sense. >>> >>> It might be reasonable to think that different URL paths with the same >>> TLD and subdomain have some semantic similarity (e.g., >>> maps.google.com/usa and maps.google.com/canada are both maps). I would >>> also suggest that if two URLs share some path elements, they are even more >>> similar. So ideally, I would come up with an encoding that has little or no >>> overlap for different TLDs, more overlap with same TLDs and subdomain, and >>> even more if they have the same TLD, subdomain and share path elements. >>> >>> Thoughts? >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > Marek Otahal :o) > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
