Re: [agi] More Testing

Abram Demski Tue, 01 Jan 2013 18:39:55 -0800

Aaron,

Yes, I'd like to put something together once I get a little more feedback
(and feel I have time, etc).


It would be possible to include basically any existing (narrow-ai) problem
category, and I'm actually not opposed to that, but it would be a lot of
work to turn it into anything even semi-unified.

The big question is, what's the top priority for creating a unified
benchmark? It would make sense to apply existing proto-AGI systems to
whatever problems they could be applied to in the set. What else would be
interesting?

Part of the inspiration was this website:

http://nsl.usc.edu/bodb/

This offers a database of neuroscience research, intended to organize the
literature in a better way. One interesting feature of the website is that
there is an emphasis on "brain operating principles": an attempt is made to
catalog research by what abstract ideas about the brain it supports.

The usage statistics at the bottom of the page suggest that this is not a
heavily used feature. (Perhaps that's simply because new BOPs are not
frequently proposed?) However, it might be interesting to discuss "AGI
operating principles" in a similar manner...


On Mon, Dec 31, 2012 at 3:51 PM, Aaron Hosford <[email protected]> wrote:

> Abram,
>
> I think you're on the right track for AGI benchmarking. If someone finds a
> hole in the benchmark suite, it can be patched by adding a new test, but in
> the meantime even the dumbest/narrowest algorithms can be evaluated for
> progress. It's like an ever-growing comprehensive IQ test for AGI
> algorithms, which can continue grow up as the algorithms do.
>
> Maybe it would be useful to put together a site with the informal list
> you've made so far, and offer a convenient way to make additional
> suggestions. I think it would also be good to look at existing AI & AGI
> algorithms and see what sorts of problems they are targeted at solving:
>
>
>    - General solution optimization (GA, GP)
>    - Planning (SOAR)
>    - Inferential reasoning (NARS, theorem provers)
>    - Learning from experience (Q-Learning, Bayesian probability)
>    - Classification learning (XCS, ANN, SV)
>    - Knowledge extraction, question answering (Watson, SIRI)
>    - NL conversation (any attempt to pass the Turing Test)
>    - etc.
>
>
>
>
> On Thu, Dec 27, 2012 at 11:03 PM, Abram Demski <[email protected]>wrote:
>
>> Nice, thanks! :D
>>
>> I'd still say it's an unfair comparison, since PAQ has had so many years
>> of fine-tuning... I guess a basic PPM would be a better comparison for a
>> basic HMM, and the desired comparison for PAQ would be "what HMM would turn
>> into if it was improved over a number of years by volunteers in a
>> competitive benchmark setting".
>>
>> In any case, this scores major points for PAQ in my already-stated
>> criteria about being successful for a range of tasks with minimal
>> re-fiddling.
>>
>> Arakawa mentioned the textual entailment stuff; that should also go under
>> the category of existing AGI sub-task benchmarks...
>>
>> "
>> - PASCAL http://www.nist.gov/tac/2011/RTE/  or
>> - NTCIR http://research.nii.ac.jp/ntcir/ntcir-10/tasks.html
>> "
>>
>> On Thu, Dec 27, 2012 at 7:53 PM, Matt Mahoney <[email protected]>wrote:
>>
>>>  On Thu, Dec 27, 2012 at 8:34 PM, Abram Demski <[email protected]>
>>> wrote:
>>> > Now, I AM disappointed with one thing about Matt's benchmark, when
>>> viewed in this way: although a number of ideas in the compression world
>>> (and many combinations/permutations) have been tried, the benchmark does
>>> not test many ideas from the AI world. For example, I cannot say how well
>>> PAQ performs against a standard HMM implementation or the many HMM
>>> variants. As a result, there is no way to actually say anything about the
>>> impact of PAQ on (narrow) AI sequence prediction. This, of course, could be
>>> remedied by simply trying it out... however, performance would likely be
>>> poor *simply* because PAQ has been tuned to this problem set over a number
>>> of years.
>>>
>>> Actually PAQ is top ranked on many benchmarks. The algorithm is quite
>>> general purpose, not just tuned for one text benchmark.
>>>
>>> I think that one reason you don't see a lot of different machine
>>> learning algorithms in compression benchmarks is that they don't do
>>> that well. You might be interested in Byron Knoll's thesis on applying
>>> PAQ to a number of different machine learning problems such as text
>>> classification, shape recognition, text completion, and game playing.
>>> PAQ beats most of the other algorithms.
>>>
>>> https://circle.ubc.ca/bitstream/handle/2429/35846/ubc_2011_fall_knoll_byron.pdf?sequence=3
>>> (section 4).
>>>
>>> --
>>> -- Matt Mahoney, [email protected]
>>>
>>>
>>> -------------------------------------------
>>> AGI
>>> Archives: https://www.listbox.com/member/archive/303/=now
>>> RSS Feed:
>>> https://www.listbox.com/member/archive/rss/303/7190161-766c6f07
>>> Modify Your Subscription: https://www.listbox.com/member/?&;
>>>
>>> Powered by Listbox: http://www.listbox.com
>>>
>>
>>
>>
>> --
>> Abram Demski
>> http://lo-tho.blogspot.com/
>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>> <http://www.listbox.com>
>>
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/7190161-766c6f07> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-- 
Abram Demski
http://lo-tho.blogspot.com/



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] More Testing

Reply via email to