Re: [Dbpedia-discussion] Introducing Sztakipedia

Yury Katkov Thu, 06 Oct 2011 12:43:44 -0700

Thank you very much indeed for such a comprehensive answer! And the
planned integration with Semantic MediaWiki is going to be really
awesome!
Yury
On Thu, Oct 6, 2011 at 1:45 PM, Mihály Héder <[email protected]> wrote:
> Hello!
>
> 1) The toolbar is really a MediaWiki user script (javascipt), not a
> browser extension or something, and you can enable in your account
> right now. Check pedia.sztaki.hu "Enable it in your account".
> It communicates with a server endpoint which is provided by us and is
> totally public and free (but is in beta! Could not test it with crowd
> load yet).
> Behind that endpoint there are a couple of servers: UIMA, Solr(Lucene)
> and other stuff. That stuff is not a beast you just download and
> install, but you don't need it anyway.
>
> 2) Well, your second question is a harder one. What I can promise that
> we will come up with some general version you can use but with less
> functionality.
>
> -the categorization relies on Yahoo search. As long as yahoo indexes
> the Wiki of your preferred language we can make it work. (A long-term
> issue is that we have to pay a small amount for it - some 4$ / 10K
> search - I will try to find someone at yahoo and ask for their support
> for instance in exhange for putting their logo in the suggestion
> window. But right now I don't even have a contact to them.)
>
> -Link recommendation relies on tf-idf data and the dbpedia data. To do
> the tf-idf calculation we need the xml dump of a certain wiki and run
> some scripts. It takes about a week in case of the english wiki,
> others are of course much smaller BUT we need some kind of stemmer or
> lemmatizer to the given language - preferably one which we can
> integrate with UIMA. We already have integrated snowball, so in theory
> we are able to process any language snowball supports
> (http://snowball.tartarus.org/texts/stemmersoverview.html). If we
> don't do the stemming, in theory tf-idf can still work but problems
> arise with languages like Hungarian - where we concatenate funky
> suffixes to the words to signal past tense, posessive, modalities,
> etc...
> From dbpedia we use the list of pages so its not optional.
>
> -Infobox recommendation is similar - it relies on the XML dump, and
> the corresponding dbpedia infobox data. If we have those we can start
> a kind of machine learning (actually done by lucene). To be able to
> display the infobox fill form with help, we also need certain xml
> files for infoboxes.
>
> -there is a co-occurence learning phase, it relies on XML dump and
> tf-idf, and is needed for book recommendation. But book search works
> without that.
>
> -Book recommendation is quite simple - you can use the english books,
> which are often referenced in non-english texts as well. To have
> non-english books we need library catalogs in some processable format.
> That can be an issue, I have not even found one for Hungarian yet.
> However, we could change this part and use library API's like Z39.50.
> There are always performance issues with those but I can see that
> sooner or later we need to support those.
>
> So to sum up, adding a new language is a piece of work right now and
> we need certain resources. However, we will try German and Hungarian
> in this year. We will try to simplify the process and will do our best
> in supporting more languages.  But we always gonna need some help form
> locals to the given country - library data and testing.
>
> I hope I answered your questions and you will become a happy user!
>
> Best Regards
> Mihály
>
> On 5 October 2011 16:56, Yury Katkov <[email protected]> wrote:
>> HI! Great tool for MediaWiki guys like me! Do you have these tool available
>> for download? And second question, will it work for non-English language?
>>
>> Yury
>>
>> On Wed, Oct 5, 2011 at 6:48 PM, Mihály Héder <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> We have made an Intelligent Assistant for Wiki which puts dbpedia in
>>> good use, you might be interested in:
>>> http://www.youtube.com/watch?v=_0ochjAwMkw
>>>
>>> I wanted to share this on this list for several reasons:
>>> 1) I wanted to say thank you for everyone who works on dbpedia, I
>>> think this is a great achievement.
>>> 2) Right now Sztakipedia is branded as an "Intelligent Assistant"
>>> which helps you in the boring work of finding links, references,
>>> infoboxes, categories etc., while creating a wiki article. But it has
>>> been designed as a two way tool from the very beginning - what I mean
>>> by that is that we could have the users to help improving dbpedia data
>>> only in some a very-nonobtrusive way of course.
>>> 3) I am interested in your thoughts and remarks in general - you
>>> surely have good ideas about what could be done with this agent in the
>>> editor!
>>> 4) And finally, the most important thing : recently I was asked to
>>> write a book chapter about the ways of using dbpedia data in mashups.
>>> Naturally it is my task to do the research and compile a good overview
>>> on how dbpedia is used in the wild as part of web interfaces. I am
>>> also familiar with the many white papers on this topic.
>>> But I still wanted to ask from everyone on this list: What are your
>>> favorite applications of dbpedia? In your opinion, what should I
>>> emphasize?
>>>
>>> Thanks you!
>>>
>>> Best Regards
>>> Mihály Héder
>>> Computer and Automation Research Institute
>>> Budapest, Hungary
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>>
>> --
>> Yury V. Katkov
>> WikiVote! llc
>>
>




-- 
Yury V. Katkov
WikiVote! llc

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Introducing Sztakipedia

Reply via email to