Ben, OpenCyc, http://www.opencyc.org has the Apache license so you can use its content commercially with no problem. It has the full Cyc ontology, including predicates and their argument constraints, but with an abbreviated lexicon, and very few rules, and few propositions between terms. It includes about 12,000 links between Cyc terms and WordNet synonym sets.
Full Cyc can be evaluated by obtaining a no-charge ResearchCyc license but in order to use that content commercially, you have to negotiate a Cyc license with Cycorp. I am only using OpenCyc content in my project. -Steve ----- Original Message ---- From: Benjamin Goertzel <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, April 18, 2007 2:33:57 PM Subject: Re: [agi] My proposal for an AGI agenda Stephen -- What do you know about Cyc's licensing terms? Let's say that Novamente reads Cyc and then learns some things from it ... but then does not retain Cyc in its memory, but only some "derivative knowledge" that is in quite different form... Is this Novamente system then considered a derivative product of Cyc, so that if an NM instance is licensed, the customer also has to license Cyc? thx Ben On 4/18/07, Stephen Reed <[EMAIL PROTECTED]> wrote: > > Hi James, > > My development source code is stored in the subversion repository at > SourceForge: http://sf.net/projects/texai . There is no GUI presently > because I am concentrating on the server-side functions and want text chat > as the system's primary communication modality. > > Recently I created my own lexicon that was derived from OpenCyc, WordNet, > the CMU Pronouncing Dictionary and from a parsed Wiktionary. I was using > MySQL as the database back end to contain these propositions and the > application performance suffered as the KB grew over 20 million propositions > (9 GB). So I conducted some experiments that demonstated that Oracle > Berkelely DB Java Edition runs very fast for my application if the size of > its database is kept below 100,000 propositions. > > While at Cycorp, I became familiar with their technique of physically > partitioning Cyc in order to excise old portions at a time when system RAM > was a limiting factor. So I am accelerating a task that I planned for later > on - a peer-to-peer network that partitions the KB. Because certain > reasonable partitions, such as context imported from WordNet, are over > 100,000 propositions in size, I will use the database sharding (table > slicing) technique to physically decompose too-large KB partitions. As > a result, my current 27 million row MySQL database will be transformed into > approximately 10 partitions and approximately 300 shards. For now I will > just run all the peers in the same JVM, as my development computer is an AMD > X2 5800 with 4 GB RAM running 64-bit Java 6. > > Some other miscellaneous details: > 1. I revised all the KB terms such as symbols, variables, numbers, > named-terms and propositions, to be indexed by a 16-byte UUID instead of a > 4-byte integer. I had been keeping track of UUID for named terms, but now > the p2p system will run faster if the IDs are the same in all peers and do > not require translation to a local integer term ID. > 2. Although my java classes are compatible with J2EE, I am using my own > light-weight container for more rapid coding and testing. Likewise the Java > Message Service (JMS) will be my peer-to-peer message transport interface, > and I will code a very simple implementation that works in a single JVM. > When I do go ahead and test a remote peer, for example with my old Window's > laptop, I'll probably use Apache ActiveMQ, which is J2EE/JMS compatible. > 3. I have already integrated the CMU Speech Tools, and imported the CMU > Pronouncing Dictionary, so that when I get a text dialog system running, it > will be simple to add speech recognition and speech generation. I have > hacked the CMU Sphinx speech recognition engine to enable the dialog system > to prune the n-best word list incrementally as each phoneme is processed, > according to discourse context. > 4. Unlike Cyc tradition, I am only creating atomic terms, not non-atomic > terms when naming new things, and I am creating only binary propositions > even though CycL allows for unary, binary, ternary, quaternary and quintary > (5- argument) propositions. This is for efficiency, and for compatibility > with the Semantic Web - OWL is limited to atomic terms and binary > propositions. > > The next step will be to resume work on the first Construction-Grammar > constructions, chosen to run my simple test cases and subsequently bootstrap > the creation of the many remaining constructions. Some have commented here > on the failure of parsers to fully understand English text. Again, while at > Cycorp, I witnessed the utilization of many English parsers with Cyc: 1. > (in-house) simple template parser, 2. (in-house) recursive phrase structure > parser, 3. (in-house) head-driven phrase structure parser, 4 Stanford > parser, 5. Charniak parser, 6 . Link -grammar parser. The best of these > work great for simple unambiguous sentences. Cyc post-processing can handle > some disambiguation within context. > > I am using Rodney Huddleston's "English Grammar, an outline" to enumerate > the required constructions for English. When I investigated Construction > Grammar (c.f. Croft, Radical Construction Grammar), it appeared not only to > solve the deep understanding problem for complex sentences and idioms, but > it also filled the missing-parser gap in Walter Kintsch's > Construction-Integration approach to text comprehension. So by coupling the > two, I hope to achieve deep English text understanding. And because my > grammar is reversible, the same constructions (persisted as KB propositions) > will drive text generation. The basic notion behind Construction Grammar > is to abandon an elegant, concise rule-based grammar, and instead use a > simple paring of form and meaning, where the forms are numerous, include > idioms and special cases, and only incidentally may be able to share > constituents. I'll use Huddleston's text to indicate the required forms > and CycL to represent the meanings. I plan to bootstrap the grammar > acquisition by hand-coding a dialog system that is designed to acquire more > form/meaning pairs. Huddleston gives example phrases for all his identified > constructions and I'll create a test suite from these. > > -Steve > > ----- Original Message ---- > From: James Ratcliff <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, April 17, 2007 2:59:35 PM > Subject: Re: [agi] My proposal for an AGI agenda > > Do you have any of this on the Net or usable form? Or can post some good > screens of it? > > James Ratcliff > > Stephen Reed <[EMAIL PROTECTED]> wrote: > > For my own AI research I am using Java. Apart from its satisfactory speed, I > like the NetBeans IDE, and most importantly like all the third-party > software libraries that I can plug in. Because my stuff is GPL, there really > is a wide variety of compatible software. For example, in the last 9 months > I built an object store to contain the OpenCyc ontology and then added > WordNet, the lexicon that I parsed from Wiktionary, and the CMU Pronouncing > Dictionary. All of this is to support a robust English dialog system that > will depend up a reversible construction grammar now under development. I > was able to plug in Hibernate and MySQL to host millions of knowledge base > propositions. Once I got above 20 million propositions, performance became > noticeably slower. So I am unplugging Hibernate and plugging in Oracle > Berkeley DB Java Edition (GPL compatible) and hope to regain ideal > performance by using a sharded (physically partitioned) object store. > > For deployment I am using J2EE which is scalable from single box (where I'm > at now) to cluster to fully distributed. > > Regarding self-modifying programs, I prefer that the system intelligently > compose its source code and then compile it. I already experimented with a > java classloader that can replace classes in a JVM on the fly. > > I'm building the dialog system so that I can teach the system in English how > to do things and so not worry about the programming > > > ________________________________ > Ahhh...imagining that irresistible "new car" smell? > Check out new cars at Yahoo! Autos. ________________________________ > This list is sponsored by AGIRI: http://www.agiri.org/email > To unsubscribe or change your options, please go to: > http://v2.listbox.com/member/?&; ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?& __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936
