Re: htm.java config questions

Takenori Sato Wed, 04 Nov 2015 22:48:13 -0800

Thanks, David! I added you as a collaborator.

- Takenori


On Thu, Nov 5, 2015 at 3:25 PM, cogmission (David Ray) <
[email protected]> wrote:

> Hi Takenori,
>
> Running a swarm is always an option. Can you give me push rights to your
> repo and check in some (small) example of data so I can have something to
> run, and I'll take a look? I'll see if I can get it up and running and push
> it back to your repo...
>
> Cheers,
> David
>
> On Wed, Nov 4, 2015 at 10:30 PM, Takenori Sato <[email protected]> wrote:
>
>> Hi David, thanks for your answers!
>>
>> I tried some, like adding SpatialPooler, changing n/w, but no luck.
>>
>> Perhaps I should run swarming in python against my data,
>> and study the configuration produced.
>>
>> - Takenori
>>
>> On Thu, Nov 5, 2015 at 3:44 AM, cogmission (David Ray) <
>> [email protected]> wrote:
>>
>>> Hi Takenori,
>>>
>>> You might think this is weird (I know I do), but as I am basically just
>>> one person writing and supporting HTM.java (with some appreciated help from
>>> community members from time to time), I haven't really had the time to
>>> **use** NuPIC. Therefore the scope of the questions I can faithfully answer
>>> are specific to setting up and using the code, together with any Java
>>> related questions. NuPIC configurations that have to do with performance of
>>> the HTM (like DateEncoder parameters, the size of W and N; and actual
>>> parameter settings - any familiar person who has used NuPIC and struggled
>>> with that learning curve can answer you.
>>>
>>> The default parameters used are those that were in the Python network
>>> examples and settings that I have been told are "decent" when asking for
>>> help myself. NuPIC parameters are not easy, and require knowledge of the
>>> "rules of thumb" (typical rules for usage). For instance, W should be an
>>> odd number for reasons having to do with finding the "center" of a series
>>> of bits. Also, if you read the class documentation for Encoder.java or
>>> base.py (The abstract base encoder for the Python version) files, you will
>>> see some discussion for N and W and how they relate to each other.
>>>
>>> In general, the difference between the ScalarEncoder and the
>>> RandomDistributedScalarEncoder is that the ScalarEncoder is a bit more
>>> efficient but requires prior knowledge of the min and max values in your
>>> expected dataset. The RDSE can be used without prior knowledge of the
>>> bounds and so is a nice alternative for unknown data. Most people just use
>>> the RDSE.
>>>
>>> Here's a video that discusses the RDSE:
>>> https://www.youtube.com/watch?v=_q5W2Ov6C9E
>>>
>>> The DateEncoder class Javadoc, and the class file itself (together with
>>> DateEncoderTest.java), have lots of documentation in them which illustrate
>>> their usage. Basically, a DateEncoder is a compound encoder that has
>>> ScalarEncoders inside it which handle different aspects of the date
>>> mechanism being used.
>>>
>>> The SpatialPooler is an integral part of the HTM - you usually want
>>> that. The only time when that has been "skipped" is when inserting an
>>> encoding scheme of your own and you want to preserve the input format. But
>>> that is an extreme corner case, I would advise to use one in your code.
>>>
>>> Don't worry about multiple regions and layers. The capacity to have
>>> multiple regions and layers exists for those who need extra flexibility.
>>> The ability to assemble Network hierarchies is mostly a "space saver" for
>>> when HTM Hierarchy code is released by Numenta in the future. The "modes"
>>> shown in the HotGym Demo are just there for demonstration purposes and
>>> really there is no internal concept of "Mode" inside the Network hierarchy.
>>> Again, the Mode in the demo is just a switch to instruct the demo to setup
>>> different hierarchy styles to show that the output is the same regardless
>>> of the number of hierarchical components used to funnel data through.
>>>
>>> I hope this helps. You can ask Numenta engineers for rules of thumb
>>> regarding the individual Parameter settings.
>>>
>>> Cheers,
>>> David
>>>
>>> On Wed, Nov 4, 2015 at 9:45 AM, Takenori Sato <[email protected]>
>>> wrote:
>>>
>>>> Hi NuPIC community and David,
>>>>
>>>> I have some questions about how to configure my network with htm.java.
>>>>
>>>> My use case is to let HTM detect an unexpected high load on a server
>>>> through PING response times. But so far, it produces 0.0 for almost any
>>>> inputs. Sometimes it returns some value, but which are not reasonable at
>>>> all.
>>>>
>>>> The biggest problem is that I am not sure at all about my
>>>> configurations. So I highly suspect my configurations are far from correct
>>>> ones.
>>>>
>>>> For your reference, you can see my codes here:
>>>>
>>>> CloudSonar project <https://github.com/ggsato/CloudSonar>
>>>> HTMAnomalyDetector
>>>> <https://github.com/ggsato/CloudSonar/blob/master/src/com/cloudian/analytics/HTMAnomalyDetector.java>
>>>>
>>>> My network configurations are based on(or I say copy and paste)
>>>> NetworkDemoHarness. They are modified slightly where I believe I 
>>>> understand.
>>>>
>>>> Here're my questions.
>>>>
>>>> *1. Parameters#getAllDefaultParameters*
>>>>
>>>> private static Network createNetwork(Sensor<ObservableSensor<String>>
>>>> sensor) {
>>>> *Parameters p = buildParams();*
>>>> p = p.union(buildEncoderParams());
>>>> return Network.create("CloudSonar", p)
>>>>            .add(Network.createRegion("Region")
>>>>                .add(Network.createLayer("Layer", p)
>>>>                    .alterParameter(KEY.AUTO_CLASSIFY, Boolean.TRUE)
>>>>                    .add(Anomaly.create())
>>>>                    .add(new TemporalMemory())
>>>>                    .add(sensor)
>>>>                    )
>>>>                );
>>>> }
>>>> private static Parameters buildParams() {
>>>> return* Parameters.getAllDefaultParameters(); <== THIS ONE*
>>>> }
>>>>
>>>> NetworkDemoHarness#getParameters confused me with many parameters. So I
>>>> picked up only the default ones without overriding anything. Can I start
>>>> like this?
>>>>
>>>> Also, are there any resources to learn about those parameters?
>>>>
>>>> *2. Encoders*
>>>>
>>>> My inputs are [timestamps, duration_in_micro_sec].
>>>>
>>>> private static String generateCSVInput(PollingJob job) {
>>>> StringBuffer sb = new StringBuffer();
>>>> sb.append(FULL_DATE_FORMAT.format(new Date())); *<== TIMESTAMP*
>>>> sb.append(CSVUpdateHandler.DELIM);
>>>> sb.append(TimeUnit.MICROSECONDS.convert(job.pollingStatus.duration(),
>>>> TimeUnit.NANOSECONDS)); *<== DURATION*
>>>> return sb.toString();
>>>> }
>>>>
>>>> I borrowed the config from NetworkDemoHarness#getHotGymFieldEncodingMap
>>>> and getNetworkDemoFieldEncodingMap(noticed mixed up). Then, modified the
>>>> red parts:
>>>>
>>>>     public static Map<String, Map<String, Object>>
>>>> getNetworkFieldEncodingMap() {
>>>>         Map<String, Map<String, Object>> fieldEncodings = setupMap(
>>>>                 null,
>>>>                 0, // n
>>>>                 0, // w
>>>>                 0, 0, 0, 0, null, null, null,
>>>>                 "timestamp", "datetime", "DateEncoder");
>>>>         fieldEncodings = setupMap(
>>>>                 fieldEncodings,
>>>>                 50,
>>>>                 21,
>>>>                 0, *10000000*, 0, 0.1, null, Boolean.TRUE, null,  *<==
>>>> 0 ~ 10 sec*
>>>>                 CLASSFIER_FIELD, "int", "ScalarEncoder");
>>>>
>>>>
>>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_DOFW.getFieldName(), new
>>>> Tuple(1, 1.0)); // Day of week
>>>>
>>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_TOFD.getFieldName(), new
>>>> Tuple(5, 4.0)); // Time of day
>>>>
>>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_PATTERN.getFieldName(),
>>>> *FULL_DATE*);
>>>>
>>>>         return fieldEncodings;
>>>>     }
>>>>
>>>> Why are all the params of DateEncoder 0 or null?
>>>>
>>>> What is the difference between ScalarEncoder
>>>> and RandomDistributedScalarEncoder?
>>>>
>>>> I happened to use the larger n and w used
>>>> by getNetworkDemoFieldEncodingMap. Compared to HotGym demo, durations is
>>>> much larger than consumption. So a larger n makes sense, but I should have
>>>> set lower w like 6?
>>>>
>>>> I wasn't able to find information how to set those DATEFIELD
>>>> parameters. PATTERN was obvious, but the other two remained unclear.
>>>> Especially, what is the Tuple, and those numbers?
>>>>
>>>> *3. SpatialPooler*
>>>>
>>>> NetworkAPIDemo uses SpatialPooler in every network. But it should be
>>>> related to spatial inputs, correct? So I dropped it from my network
>>>> configuration. I have read the JavaDoc, but got no clue. What is it for?
>>>>
>>>> *4. Multiple Regions and Layers*
>>>>
>>>> I wasn't able to understand the difference between those 3 modes in
>>>> NetworkAPIDemo. I understand MULTILAYER uses multiple layers, and
>>>> MULTIREGION uses multiple regions. But when to use which mode in practice?
>>>>
>>>>
>>>> I gave all of these stupid questions, but in overall, I was impressed
>>>> that the design is easy to understand to integrate htm.java in my own
>>>> application!!
>>>>
>>>> Thanks,
>>>> Takenori
>>>>
>>>
>>>
>>>
>>> --
>>> *With kind regards,*
>>>
>>> David Ray
>>> Java Solutions Architect
>>>
>>> *Cortical.io <http://cortical.io/>*
>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>
>>> [email protected]
>>> http://cortical.io
>>>
>>
>>
>
>
> --
> *With kind regards,*
>
> David Ray
> Java Solutions Architect
>
> *Cortical.io <http://cortical.io/>*
> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>
> [email protected]
> http://cortical.io
>

Re: htm.java config questions

Reply via email to