Well, there are several things to test here

-- the semantic mapping

-- the language generation

-- the overall behavior of the dialogue system

I was talking about testing of the first two...

Regarding the overall behavior of the dialogue system,
question-answering, as you  mention, is part of the story.

Another part is the system describing what is happening to a
virtual-world agent that it's controllig....

Another part is question-ASKING ... when the virtual-world agent needs
something in the game world, does it ask for it?

For each of these behaviors, one can of course make lists of desired
behaviors, e.g.

-- question/answer pairs

-- situation/description pairs

-- need/question-asked pairs

and one can rate how many of them the system can successfully execute...

But this is not really a terribly rigorous sort of test either,
because of course one can always tweak the system so as to perform
well on the specific behaviors in the test lists....   This becomes
not unlike the kind of tweaking the IBM folks did with Watson, except
that there they had a clear "out of sample" way to test the system,
i.e. on the new Jeopardy questions being posed...

Some level of formal testing will be done for this particular
subsystem of OpenCog, because it's part of Ruiting's PhD thesis... and
theses need to have formal test results....  But I don't really think
the formal testing will be key for driving the work forward, in this
case...

Regarding unit tests for the code, we will likely pick some simple
sentences and situations, and be sure the system can handle them
appropriately.  This will be a stupidity-test against future revisions
inadvertently really fucking up the functionality....   OpenCog is
reasonably good in terms of having unit tests for components. For
instance, when Jade replaced the C++ version of PLN with a Python
version, she validated that it could still perform all the inferences
in the unit tests...

In our use of MOSES for genetic or financial analysis, OTOH, formal
quantitative testing is critical for driving the work forward....  So
I'm certainly not opposed to rigorous, quantitative testing as a
general principle !!


-- Ben G





On Thu, Dec 27, 2012 at 11:22 AM, Matt Mahoney <[email protected]> wrote:
> On Thu, Dec 27, 2012 at 11:12 AM, Ben Goertzel <[email protected]> wrote:
>> I think we will need to have a handful of expert humans rate the
>> results on a small test corpus.
>
> What about preparing a set of questions and answers in advance?
>
> I am thinking about eliminating a source of bias, namely "I didn't
> think of that answer, but it is close enough". Of course this
> introduces a second bias, namely tuning the system to pass the test.
> For that you would need to withhold part of the test questions until
> the end.
>
> The problem I really want to avoid is building something according to
> plan that ends up not doing anything useful. This way we will know if
> it is useful or not before we build it.
>
>
> --
> -- Matt Mahoney, [email protected]
>
> --
> You received this message because you are subscribed to the Google Groups 
> "opencog" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/opencog?hl=en.
>



-- 
Ben Goertzel, PhD
http://goertzel.org

"My humanity is a constant self-overcoming" -- Friedrich Nietzsche


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to