Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-08-08 Thread Robby Findler
I see that I was misled by raco demod, which suggests that this: (define (f x) (match-define (cons a b) x) (+ a b)) compiles into this: (define-values (_f) (lambda (arg0-21) '#(f # 2 0 14 54 #f) '(flags: preserves-marks single-result) '(captures:

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-08-08 Thread Sam Tobin-Hochstadt
On Thu, Jul 27, 2017 at 8:36 PM, Robby Findler wrote: > I see from raco demod that match-define does not generate very > beautiful code, which leads me to this version (which is what I think > match-define should have expanded into). It seems to be about 4x > faster

[racket-users] Re: Decision Tree in Racket - Performance

2017-08-07 Thread Zelphir Kaltstahl
I've now implemented post pruning with 2 test cases as well. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more

[racket-users] Re: Decision Tree in Racket - Performance

2017-08-01 Thread Zelphir Kaltstahl
I think I've now implemented most of the optimizations mentioned in this topic. I also adapted my code to use some of the neat things I found in Daniel's code. The implementation now supports most of the optimization parameters for decision trees, which I found on:

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Philip McGrath
Specifically, as Robby said earlier, `list?` is memoized, so e.g. (first (rest (build-list 10 values))) only pays this price once. And `list?` rejects pairs that have cycles. -Philip On Fri, Jul 28, 2017 at 5:51 AM, Matthias Felleisen wrote: > > first and rest were

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Matthias Felleisen
first and rest were introduced in the teaching languages first when we decided it was about principles of design and cadadar was cute but nobody should have care. first and rest are about lists in other languages and the names say so. car and cdr are about pairs (not that their names say so)

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Robby Findler
It could have been. I am not sure why (but it probably had something to do with better checking for the teaching languages, Matthias may recall more). Robby On Fri, Jul 28, 2017 at 4:19 AM Daniel Prager wrote: > Interesting stuff, but if I may probe a little deeper

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Neil Van Dyke
As soon as I sent it, I saw I was conflating lists with what CL calls proper lists. I'll just link: http://docs.racket-lang.org/reference/pairs.html http://clhs.lisp.se/Body/t_list.htm http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.3.2 -- You received this

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Daniel Prager
Interesting stuff, but if I may probe a little deeper into scheme history, why couldn't first have simply been defined as a synonym for car (i.e. first item in a pair) and rest a synonym for cdr? Dan On Fri, Jul 28, 2017 at 6:09 PM, Neil Van Dyke wrote: > Daniel Prager

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Neil Van Dyke
Daniel Prager wrote on 07/28/2017 01:03 AM: > `first` and `rest` need to check if the input is a list. Why is that? When Racket/Scheme/Lisp people speak of checking whether something is a list, I think they usually mean in the sense of the predicate `list?` (or `listp`), which is usually an

[racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Zelphir Kaltstahl
In general what is the opinion on replacing `first` and `rest` with `car` and `cdr`, when considering readability? I find `first` and `rest` very readable and remember that I got quite confused when I started learning Racket with all the cadrdrdrd ;) I still think `first` and `rest` reads

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Zelphir Kaltstahl
On Friday, July 28, 2017 at 2:02:56 AM UTC+2, gustavo wrote: > I agree with the in-list explanation, but I want to remark a few details. > > >> I don't really understand the (in-list ...) thing. This seems to be > >> internal magic to me. > > `in-list` is not a function, it's a macro that looks

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-28 Thread Zelphir Kaltstahl
On Thursday, July 27, 2017 at 10:17:00 PM UTC+2, Daniel Prager wrote: > > Wow, are those timings for the "big" data set?! > > > I use 1/5 of the data as a training set, in line with my understanding of the > original article, which splits it in 5. > > I use the remaining 4/5 as a single

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
> `first` and `rest` need to check if the input is a list. Why is that? On Fri, Jul 28, 2017 at 1:09 PM, Gustavo Massaccesi wrote: > I was convinced that case-lambas were much slower. > > I made some tests, but I replaced first/rest with car/cdr beause it's > much faster.

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Gustavo Massaccesi
I was convinced that case-lambas were much slower. I made some tests, but I replaced first/rest with car/cdr beause it's much faster. Also I repeated the call 1000 times. I got almost 1/2 second of difference. opt cpu time: 12266 real time: 12264 gc time: 0 acc cpu time: 11750 real time: 11745

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Robby Findler
On Thu, Jul 27, 2017 at 8:13 PM, Philip McGrath wrote: > ; timed in DrRacket w/ debugging This is probably not a good idea. More information here: http://docs.racket-lang.org/guide/performance.html Robby -- You received this message because you are subscribed to the

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Robby Findler
`first` and `rest` need to check if the input is a list (a linear-time but memoized test), whereas `car` and `cdr` check only if the inputs are `pair?`s, and match-define uses unsafe operations, doing the pair check only once. I see from raco demod that match-define does not generate very

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Philip McGrath
`match-define` seems to be the real source of the speedup for me: #lang racket (define (as) (build-list 100 (λ (n) (random 100 (define (bs) (build-list 100 (λ (n) (random 100 (define (f as bs [acc 0]) (if (or (null? as) (null? bs)) acc (f (rest as) (rest bs) (+

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
The performance hit of first / rest vs car / cdr is a little disquieting from a readability POV. Very informative discussion, though! Dan -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Matthias Felleisen
> On Jul 27, 2017, at 9:18 PM, Matthew Flatt wrote: > > I don't think optional arguments are all that slow. This isn’t just Matthew’s opinion. Vincent’s feature-specific profiler could not detect a significant impact of optional or keyword arguments. (See Compiler

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Matthew Flatt
At Thu, 27 Jul 2017 22:08:33 -0300, Gustavo Massaccesi wrote: > Functions with optional arguments are slow. They are expanded to a > case-lambda and an if. This version with an auxiliary function is > faster: I don't think optional arguments are all that slow. You're seeing the result of `list?`

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Philip McGrath
*Slightly too significant, since I wrote g incorrectly. This is right: #lang racket (collect-garbage) (collect-garbage) (collect-garbage) (define as (build-list 100 (λ (n) (random 100 (define bs (build-list 100 (λ (n) (random 100 (define (f as bs [acc 0]) (if (or (null? as)

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Philip McGrath
Eliminating the optional argument and using match-define to avoid any redundant checking on first and rest gets me a significant speedup: #lang racket (collect-garbage) (collect-garbage) (collect-garbage) (define as (build-list 100 (λ (n) (random 100 (define bs (build-list 100 (λ

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Gustavo Massaccesi
Functions with optional arguments are slow. They are expanded to a case-lambda and an if. This version with an auxiliary function is faster: #lang racket (define as (build-list 100 (λ (n) (random 100 (define bs (build-list 100 (λ (n) (random 100 (define (f/opt as bs [acc 0])

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Jon Zeppieri
You can get better performance out of the recursive function by using car/cdr instead of first/rest; first/rest require their arguments to be lists, whereas car/cdr require theirs to be pairs, which is a lot cheaper to check. Also, using an optional argument (in a loop, especially) makes a

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
Revised to collect garbage before each timing shows the recursive function isn't too bad (but not great): cpu time: 405 real time: 404 gc time: 0 2452578471 cpu time: 368 real time: 368 gc time: 0 2452578471 cpu time: 50 real time: 50 gc time: 0 2452578471 cpu time: 194 real time: 195 gc time:

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
To dramatise Gustavo's point, here's a quick comparison of some different methods: #lang racket (collect-garbage) (collect-garbage) (collect-garbage) (define as (build-list 100 (λ (n) (random 100 (define bs (build-list 100 (λ (n) (random 100 (define (f as bs [acc 0]) (if (or

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Gustavo Massaccesi
I agree with the in-list explanation, but I want to remark a few details. >> I don't really understand the (in-list ...) thing. This seems to be internal >> magic to me. `in-list` is not a function, it's a macro that looks like a function. `for` is another macro (that doesn't look like a

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
Zelphir: > I don't really understand the (in-list ...) thing. This seems to be internal magic to me. Maybe it informs about the type of what is iterated over and that makes it faster? Pretty much. Here's a program that constructs a lot of random numbers and adding them up in a few different ways.

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Neil Van Dyke
Before each `time` call, so that you're not getting really bad data due to GC costs of other code, do this: (collect-garbage) (collect-garbage) (collect-garbage) The three times, like "there's no place like home", is a bit of folklore that I'm not certain is still necessary or sufficient,

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
> Wow, are those timings for the "big" data set?! I use 1/5 of the data as a training set, in line with my understanding of the original article, which splits it in 5. I use the remaining 4/5 as a single validation set rather than four separate sets (because I got lazy). That won't impact the

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Jon Zeppieri
On Thu, Jul 27, 2017 at 4:01 PM, Zelphir Kaltstahl wrote: > Wow, are those timings for the "big" data set?! > Unless I'm mistaken, Daniel's times are for 274 rows of the big data set, not the whole thing. -- You received this message because you are subscribed to

[racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Zelphir Kaltstahl
I had another idea for the implementation, which I tried to implement. I tried to store in each node a procedure, which can be used when predicting data point labels, so that I can simply get that from the struct and call it with the data point and get back, whether I should go left or right at

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Zelphir Kaltstahl
On Wednesday, July 26, 2017 at 3:17:46 PM UTC+2, gustavo wrote: > I read the solution of Daniel Prager and I have a few minor changes. > > * First I added a test that repeats `build-tree` 20 times, so the run > time is approximately 1-2 seconds. This is not necessary, but times > smaller than 1

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-27 Thread Daniel Prager
Thanks! That's a useful technique, and your adjutant utilities look handy. There should (could?) be an annual poll for promotion of the best conveniences into the mainstream of Racket! Dan On 27 Jul. 2017 1:43 pm, "Philip McGrath" wrote: > Re multiple return values,

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-26 Thread Philip McGrath
Re multiple return values, you can write (call-with-values (λ () (values 1 2 3)) list) Because this problem bugs me, I've also written a package adjutor that includes values->list, list->values, and for/fold/define: http://docs.racket-lang.org/adjutor/index.html -Philip On Wed, Jul 26, 2017 at

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-26 Thread Daniel Prager
Hi Gustavo Nice catch on the in-list front. I keep forgetting to do this, although I usually remember to use in-range when iterating from 0 to n-1. I tend to omit in-list for convenience rather than because I'm writing generic algorithms. I wonder what the general usage patterns are like. I

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-26 Thread Gustavo Massaccesi
I read the solution of Daniel Prager and I have a few minor changes. * First I added a test that repeats `build-tree` 20 times, so the run time is approximately 1-2 seconds. This is not necessary, but times smaller than 1 second are sometimes not reliable. I'm using: ;--- (random-seed 12345)

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-26 Thread Daniel Prager
Actually, I only use multiple values because that's what for/fold returns. Is there a way to convert from values to a list without going through define-values or similar? Dan On 26 Jul. 2017 09:40, "Zelphir Kaltstahl" wrote: I've come to the conclusion, that not

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-25 Thread Zelphir Kaltstahl
I've come to the conclusion, that not assuming binary classification makes no sense, since every n-class classification problem can be split into n binary classification problems. When assuming classes 0 and 1, the performance increases and I get to: cpu time: 608 real time: 605 gc time: 68

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
On Monday, July 24, 2017 at 11:36:00 PM UTC+2, Daniel Prager wrote: > Hi Zelphir > > Thanks for the attribution. > > I'm running on a MacBook Air, 2012 vintage. > > Why not run both my and your code on your machine and compare? > > I made no optimisations other than assuming binary

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Daniel Prager
Hi Zelphir Thanks for the attribution. I'm running on a MacBook Air, 2012 vintage. Why not run both my and your code on your machine and compare? I made no optimisations other than assuming binary classification. Dan On Tue, Jul 25, 2017 at 6:46 AM, Zelphir Kaltstahl <

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
On Monday, July 24, 2017 at 10:04:36 PM UTC+2, Daniel Prager wrote: > Jon wrote: > > Aside: if I read Daniel's solution correct, he avoids the first issue by > >assuming that it's a binary classification task (that is, that there are > >only two classes). > > > Yep: I'm assuming binary

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Daniel Prager
Jon wrote: > Aside: if I read Daniel's solution correct, he avoids the first issue by assuming that it's a binary classification task (that is, that there are only two classes). Yep: I'm assuming binary classification. David wrote: > Out of curiosity, how much of that 0.5 seconds is overhead?

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
On Monday, July 24, 2017 at 6:44:23 PM UTC+2, Matthias Felleisen wrote: > > On Jul 24, 2017, at 12:37 PM, Zelphir Kaltstahl > > wrote: > > > > In general I think available libraries are important for creating > > attraction to a programming language. For example I

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Matthias Felleisen
> On Jul 24, 2017, at 12:37 PM, Zelphir Kaltstahl > wrote: > > In general I think available libraries are important for creating attraction > to a programming language. For example I like Python's machine learning and > data munging libraries and this is one

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
Already fixed that `class-labels` issue, thanks, it was a quick fix today morning. I already tested it with that change and it was way faster. I thought about going for binary classification only while programming, but I think I managed to keep it generic for multiple classes at least in most

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Jon Zeppieri
Just want to emphasize that the main source of inefficiency in your code is what I mentioned in my last message (iterating over the class labels of each row instead of the unique class labels of the entire data set). The second biggest factor is your structural recursion over a non-recursive

Re: [racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread David Storrs
Out of curiosity, how much of that 0.5 seconds is overhead? Could you run a simple 'add 1 and 1' procedure and see how long it takes? On Mon, Jul 24, 2017 at 9:18 AM, Daniel Prager wrote: > Hi Zelphir > > Thank-you for a fun exercise! > > My implementation using

[racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
> - Are you running this code inside DrRacket? If so, have you timed the > difference between running it with debugging enabled and with no > debugging or profiling? (Language -> Choose Language... -> Details) I am running from terminal with: ```racket -l errortrace -t FILE``` and ```raco test

[racket-users] Re: Decision Tree in Racket - Performance

2017-07-24 Thread Zelphir Kaltstahl
Wow, thanks for all the feedback, I'll try to get most of the mentioned stuff done and then post an update : ) > I teach trees and decision trees to freshman students who have never > programmed before, and Racket’s forms of data and functions are extremely > suitable to this domain. I think