Pascal - Your idea reminds me a bit of Banjo: http://ban.jo/
This is a private corporation, but doing something somewhat similar - at least in that they have divided the globe up into a giant grid, and within each cell of that grid, they do anomaly detection. Except, instead of geophysical data, they are monitoring social activity by observing geotagged photos, tweets, posts, etc. - Jeff On Tue, Aug 4, 2015 at 3:27 PM Jared Casner <[email protected]> wrote: > Hi Pascal, > > So, let me see if I understand correctly. For now, you don't require any > geo-encoding of data (but it sounds like that might be a useful feature in > the future?) Instead, you will create a list of regions / polygons that > represent a geofenced area. Within each region, you will have some set of > sensors - air pressure, humidity, wind speed, seismic activity, > temperature, etc. Your goal is to generate anomaly scores for each of > those sensors - which produce scalar data. You then plan to do some > additional logistic regression on top of the anomaly scores to predict the > likelihood of a natural disaster (earthquake, meteorological, etc) in that > region or nearby regions. It would be up to the statistician to correlate > regions in the short term, correct? Also, if I've understood you > correctly, the biggest issue that researchers face currently with respect > to this problem is that their predictions for each sensor aren't always > accurate because of daily variations in the data that are unexpected? > > I hope I've now understood the problem, but please clarify if I've > mis-stated anything. > > Assuming I have a basic understanding of the problem, I think you may be > able to simplify the engineering task a little bit. It seems to me that > your primary objective isn't to have an easy-to-read user interface that > displays data to an end user. Instead, you want data available to > researchers in a format that they can do the logistic regression on. So, > perhaps you can simplify your project by starting with HTMEngine directly. > I'm sure by now you've seen Matt's demo [1] of HTMEngine - that may be a > good place to start. In his NYC Traffic demo [2], each road segment > represents a geolocation and has a scalar metric (average speed) associated > to it. Assuming you have easy access to the data, you can probably use > this as a good basis for getting started. The output is available in both > json and csv formats, so should be easily accessible to a researcher. > > To answer one of your original questions about Numenta engineers helping > out on this project, they're all free to help in their off time! One of > our big objectives of opening access to NuPIC and the Numenta Apps was to > provide a means for you - and those like you - to get in and do things that > we just don't have the bandwidth to do internally. I'm thrilled to see > your excitement and hope that others in the community will want to get > involved to help you out! > > Cheers, > > Jared > > [1] https://www.youtube.com/watch?v=lzJd_a6y6-E > [2] https://github.com/nupic-community/htmengine-traffic-tutorial > > > >> >> ---------- Forwarded message ---------- >> From: Pascal Weinberger <[email protected]> >> To: "NuPIC general mailing list." <[email protected]> >> Cc: >> Date: Tue, 4 Aug 2015 12:13:04 +0200 >> Subject: Re: nostradamIQ Project help needed! >> Matt, >> That's true, but you do not need it at all: >> Take the world, splice it in polygons (according to the density of data >> availably and resolution needed); label you polygons, and get your data for >> each polygon with the label consisting of Where:What, with where being the >> label of the specific geo-area according to your above system, and what the >> label for what kind of data you push (like seismic etc.). And there you >> have your data format: Label to scalar! >> Now the htmengine outputs you anomaly scores for each >> Label Where:What and you take these to hierarchically (in a >> geo-hierarchie) build logistic regression models, trained by the anomaly >> output, and a binary value for whether a certain disaster happened there at >> a time X later time or not. (This needs some past data which is why the >> highest priority is getting the data polled and htmengine trained). You go >> for logistic regression because that is what literature finds to perform >> best. Now when that works, you have your 'live' data stream and get >> predictions in the form of probabilities for the disaster occurring X time >> in the future... >> >> This was the basic idea.. of course you will need to test it and refine >> the architecture etc. But you got your work-around :) >> >> So htmengine is not supposed to do the entire job. its more for feature >> detection :) The problem researchers find when building log-reg models with >> real data (raw scalars of the sensors) is that they periodically make wrong >> predictions due to daily etc. patterns. This is what HTM should filter out >> ;) >> >> The point of using tuarus as a starter therefore is that you already have >> your basic infrastructure of companies (your geo-polygons) and different >> metrics (the different sensor data in that region).. >> >> Does it make more sense now? :) Of course a geoencoder and so would be >> nice in addition to capture more of the patterns, but this is what I would >> hope to achieve with the geo-hierarchy of log-reg models so they capture >> the spatial relationships in their input weights (of course only based on >> historical data)... I do not think the geoEncoder Would get this as well.. >> When running the demo_app, you find that the geoendoding with >> radius=Magnitude or any exponential function thereof makes HTM immune to >> regions where at least one strong quake happend... and you dont want that. >> >> but David, you may think about building a engine for java as well :) Just >> cause its faster ;D >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> >
