Aaron, Yes, I'd like to put something together once I get a little more feedback (and feel I have time, etc).
It would be possible to include basically any existing (narrow-ai) problem category, and I'm actually not opposed to that, but it would be a lot of work to turn it into anything even semi-unified. The big question is, what's the top priority for creating a unified benchmark? It would make sense to apply existing proto-AGI systems to whatever problems they could be applied to in the set. What else would be interesting? Part of the inspiration was this website: http://nsl.usc.edu/bodb/ This offers a database of neuroscience research, intended to organize the literature in a better way. One interesting feature of the website is that there is an emphasis on "brain operating principles": an attempt is made to catalog research by what abstract ideas about the brain it supports. The usage statistics at the bottom of the page suggest that this is not a heavily used feature. (Perhaps that's simply because new BOPs are not frequently proposed?) However, it might be interesting to discuss "AGI operating principles" in a similar manner... On Mon, Dec 31, 2012 at 3:51 PM, Aaron Hosford <[email protected]> wrote: > Abram, > > I think you're on the right track for AGI benchmarking. If someone finds a > hole in the benchmark suite, it can be patched by adding a new test, but in > the meantime even the dumbest/narrowest algorithms can be evaluated for > progress. It's like an ever-growing comprehensive IQ test for AGI > algorithms, which can continue grow up as the algorithms do. > > Maybe it would be useful to put together a site with the informal list > you've made so far, and offer a convenient way to make additional > suggestions. I think it would also be good to look at existing AI & AGI > algorithms and see what sorts of problems they are targeted at solving: > > > - General solution optimization (GA, GP) > - Planning (SOAR) > - Inferential reasoning (NARS, theorem provers) > - Learning from experience (Q-Learning, Bayesian probability) > - Classification learning (XCS, ANN, SV) > - Knowledge extraction, question answering (Watson, SIRI) > - NL conversation (any attempt to pass the Turing Test) > - etc. > > > > > On Thu, Dec 27, 2012 at 11:03 PM, Abram Demski <[email protected]>wrote: > >> Nice, thanks! :D >> >> I'd still say it's an unfair comparison, since PAQ has had so many years >> of fine-tuning... I guess a basic PPM would be a better comparison for a >> basic HMM, and the desired comparison for PAQ would be "what HMM would turn >> into if it was improved over a number of years by volunteers in a >> competitive benchmark setting". >> >> In any case, this scores major points for PAQ in my already-stated >> criteria about being successful for a range of tasks with minimal >> re-fiddling. >> >> Arakawa mentioned the textual entailment stuff; that should also go under >> the category of existing AGI sub-task benchmarks... >> >> " >> - PASCAL http://www.nist.gov/tac/2011/RTE/ or >> - NTCIR http://research.nii.ac.jp/ntcir/ntcir-10/tasks.html >> " >> >> On Thu, Dec 27, 2012 at 7:53 PM, Matt Mahoney <[email protected]>wrote: >> >>> On Thu, Dec 27, 2012 at 8:34 PM, Abram Demski <[email protected]> >>> wrote: >>> > Now, I AM disappointed with one thing about Matt's benchmark, when >>> viewed in this way: although a number of ideas in the compression world >>> (and many combinations/permutations) have been tried, the benchmark does >>> not test many ideas from the AI world. For example, I cannot say how well >>> PAQ performs against a standard HMM implementation or the many HMM >>> variants. As a result, there is no way to actually say anything about the >>> impact of PAQ on (narrow) AI sequence prediction. This, of course, could be >>> remedied by simply trying it out... however, performance would likely be >>> poor *simply* because PAQ has been tuned to this problem set over a number >>> of years. >>> >>> Actually PAQ is top ranked on many benchmarks. The algorithm is quite >>> general purpose, not just tuned for one text benchmark. >>> >>> I think that one reason you don't see a lot of different machine >>> learning algorithms in compression benchmarks is that they don't do >>> that well. You might be interested in Byron Knoll's thesis on applying >>> PAQ to a number of different machine learning problems such as text >>> classification, shape recognition, text completion, and game playing. >>> PAQ beats most of the other algorithms. >>> >>> https://circle.ubc.ca/bitstream/handle/2429/35846/ubc_2011_fall_knoll_byron.pdf?sequence=3 >>> (section 4). >>> >>> -- >>> -- Matt Mahoney, [email protected] >>> >>> >>> ------------------------------------------- >>> AGI >>> Archives: https://www.listbox.com/member/archive/303/=now >>> RSS Feed: >>> https://www.listbox.com/member/archive/rss/303/7190161-766c6f07 >>> Modify Your Subscription: https://www.listbox.com/member/?& >>> >>> Powered by Listbox: http://www.listbox.com >>> >> >> >> >> -- >> Abram Demski >> http://lo-tho.blogspot.com/ >> *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> >> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> | >> Modify <https://www.listbox.com/member/?&> Your Subscription >> <http://www.listbox.com> >> > > *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/7190161-766c6f07> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > -- Abram Demski http://lo-tho.blogspot.com/ ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
