Thank you very much indeed for such a comprehensive answer! And the planned integration with Semantic MediaWiki is going to be really awesome! Yury On Thu, Oct 6, 2011 at 1:45 PM, Mihály Héder <[email protected]> wrote: > Hello! > > 1) The toolbar is really a MediaWiki user script (javascipt), not a > browser extension or something, and you can enable in your account > right now. Check pedia.sztaki.hu "Enable it in your account". > It communicates with a server endpoint which is provided by us and is > totally public and free (but is in beta! Could not test it with crowd > load yet). > Behind that endpoint there are a couple of servers: UIMA, Solr(Lucene) > and other stuff. That stuff is not a beast you just download and > install, but you don't need it anyway. > > 2) Well, your second question is a harder one. What I can promise that > we will come up with some general version you can use but with less > functionality. > > -the categorization relies on Yahoo search. As long as yahoo indexes > the Wiki of your preferred language we can make it work. (A long-term > issue is that we have to pay a small amount for it - some 4$ / 10K > search - I will try to find someone at yahoo and ask for their support > for instance in exhange for putting their logo in the suggestion > window. But right now I don't even have a contact to them.) > > -Link recommendation relies on tf-idf data and the dbpedia data. To do > the tf-idf calculation we need the xml dump of a certain wiki and run > some scripts. It takes about a week in case of the english wiki, > others are of course much smaller BUT we need some kind of stemmer or > lemmatizer to the given language - preferably one which we can > integrate with UIMA. We already have integrated snowball, so in theory > we are able to process any language snowball supports > (http://snowball.tartarus.org/texts/stemmersoverview.html). If we > don't do the stemming, in theory tf-idf can still work but problems > arise with languages like Hungarian - where we concatenate funky > suffixes to the words to signal past tense, posessive, modalities, > etc... > From dbpedia we use the list of pages so its not optional. > > -Infobox recommendation is similar - it relies on the XML dump, and > the corresponding dbpedia infobox data. If we have those we can start > a kind of machine learning (actually done by lucene). To be able to > display the infobox fill form with help, we also need certain xml > files for infoboxes. > > -there is a co-occurence learning phase, it relies on XML dump and > tf-idf, and is needed for book recommendation. But book search works > without that. > > -Book recommendation is quite simple - you can use the english books, > which are often referenced in non-english texts as well. To have > non-english books we need library catalogs in some processable format. > That can be an issue, I have not even found one for Hungarian yet. > However, we could change this part and use library API's like Z39.50. > There are always performance issues with those but I can see that > sooner or later we need to support those. > > So to sum up, adding a new language is a piece of work right now and > we need certain resources. However, we will try German and Hungarian > in this year. We will try to simplify the process and will do our best > in supporting more languages. But we always gonna need some help form > locals to the given country - library data and testing. > > I hope I answered your questions and you will become a happy user! > > Best Regards > Mihály > > On 5 October 2011 16:56, Yury Katkov <[email protected]> wrote: >> HI! Great tool for MediaWiki guys like me! Do you have these tool available >> for download? And second question, will it work for non-English language? >> >> Yury >> >> On Wed, Oct 5, 2011 at 6:48 PM, Mihály Héder <[email protected]> wrote: >>> >>> Hello, >>> >>> We have made an Intelligent Assistant for Wiki which puts dbpedia in >>> good use, you might be interested in: >>> http://www.youtube.com/watch?v=_0ochjAwMkw >>> >>> I wanted to share this on this list for several reasons: >>> 1) I wanted to say thank you for everyone who works on dbpedia, I >>> think this is a great achievement. >>> 2) Right now Sztakipedia is branded as an "Intelligent Assistant" >>> which helps you in the boring work of finding links, references, >>> infoboxes, categories etc., while creating a wiki article. But it has >>> been designed as a two way tool from the very beginning - what I mean >>> by that is that we could have the users to help improving dbpedia data >>> only in some a very-nonobtrusive way of course. >>> 3) I am interested in your thoughts and remarks in general - you >>> surely have good ideas about what could be done with this agent in the >>> editor! >>> 4) And finally, the most important thing : recently I was asked to >>> write a book chapter about the ways of using dbpedia data in mashups. >>> Naturally it is my task to do the research and compile a good overview >>> on how dbpedia is used in the wild as part of web interfaces. I am >>> also familiar with the many white papers on this topic. >>> But I still wanted to ask from everyone on this list: What are your >>> favorite applications of dbpedia? In your opinion, what should I >>> emphasize? >>> >>> Thanks you! >>> >>> Best Regards >>> Mihály Héder >>> Computer and Automation Research Institute >>> Budapest, Hungary >>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2dcopy1 >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> >> >> -- >> Yury V. Katkov >> WikiVote! llc >> >
-- Yury V. Katkov WikiVote! llc ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
