[Dbpedia-discussion] Introducing Gazetiki (geographical database)

Popescu Adrian Wed, 12 Oct 2011 05:03:26 -0700

Hello,

Gazetiki is a geographical database which was constituted at CEA LIST 
(http://www-list.cea.fr/gb/index_gb.htm) by using information from Geonames and 
from different Web sources. which contains 8323702 geographical names coming 
from Geonames and from different Web sources, with the latter representing over 
1 million items.  Aside from the enrichment of the database, the main 
modification with respect to Geonames is the addition of a popularity score 
which was calculated based on the usage of a place name in a geotagged dataset. 
If interested, you will find more details 
at http://georama-project.labs.exalead.com/gazetiki.htm



In order to improve Gazetiki, your feedback and questions are most welcome. 

Best regards,
Adrian Popescu
http://comupedia.org/adrian


________________________________
From: Mihály Héder <[email protected]>
To: Yury Katkov <[email protected]>
Cc: [email protected]
Sent: Thursday, 6 October 2011, 11:45
Subject: Re: [Dbpedia-discussion] Introducing Sztakipedia

Hello!

1) The toolbar is really a MediaWiki user script (javascipt), not a
browser extension or something, and you can enable in your account
right now. Check pedia.sztaki.hu "Enable it in your account".
It communicates with a server endpoint which is provided by us and is
totally public and free (but is in beta! Could not test it with crowd
load yet).
Behind that endpoint there are a couple of servers: UIMA, Solr(Lucene)
and other stuff. That stuff is not a beast you just download and
install, but you don't need it anyway.

2) Well, your second question is a harder one. What I can promise that
we will come up with some general version you can use but with less
functionality.

-the categorization relies on Yahoo search. As long as yahoo indexes
the Wiki of your preferred language we can make it work. (A long-term
issue is that we have to pay a small amount for it - some 4$ / 10K
search - I will try to find someone at yahoo and ask for their support
for instance in exhange for putting their logo in the suggestion
window. But right now I don't even have a contact to them.)

-Link recommendation relies on tf-idf data and the dbpedia data. To do
the tf-idf calculation we need the xml dump of a certain wiki and run
some scripts. It takes about a week in case of the english wiki,
others are of course much smaller BUT we need some kind of stemmer or
lemmatizer to the given language - preferably one which we can
integrate with UIMA. We already have integrated snowball, so in theory
we are able to process any language snowball supports
(http://snowball.tartarus.org/texts/stemmersoverview.html). If we
don't do the stemming, in theory tf-idf can still work but problems
arise with languages like Hungarian - where we concatenate funky
suffixes to the words to signal past tense, posessive, modalities,
etc...
>From dbpedia we use the list of pages so its not optional.

-Infobox recommendation is similar - it relies on the XML dump, and
the corresponding dbpedia infobox data. If we have those we can start
a kind of machine learning (actually done by lucene). To be able to
display the infobox fill form with help, we also need certain xml
files for infoboxes.

-there is a co-occurence learning phase, it relies on XML dump and
tf-idf, and is needed for book recommendation. But book search works
without that.

-Book recommendation is quite simple - you can use the english books,
which are often referenced in non-english texts as well. To have
non-english books we need library catalogs in some processable format.
That can be an issue, I have not even found one for Hungarian yet.
However, we could change this part and use library API's like Z39.50.
There are always performance issues with those but I can see that
sooner or later we need to support those.

So to sum up, adding a new language is a piece of work right now and
we need certain resources. However, we will try German and Hungarian
in this year. We will try to simplify the process and will do our best
in supporting more languages.  But we always gonna need some help form
locals to the given country - library data and testing.

I hope I answered your questions and you will become a happy user!

Best Regards
Mihály

On 5 October 2011 16:56, Yury Katkov <[email protected]> wrote:
> HI! Great tool for MediaWiki guys like me! Do you have these tool available
> for download? And second question, will it work for non-English language?
>
> Yury
>
> On Wed, Oct 5, 2011 at 6:48 PM, Mihály Héder <[email protected]> wrote:
>>
>> Hello,
>>
>> We have made an Intelligent Assistant for Wiki which puts dbpedia in
>> good use, you might be interested in:
>> http://www.youtube.com/watch?v=_0ochjAwMkw
>>
>> I wanted to share this on this list for several reasons:
>> 1) I wanted to say thank you for everyone who works on dbpedia, I
>> think this is a great achievement.
>> 2) Right now Sztakipedia is branded as an "Intelligent Assistant"
>> which helps you in the boring work of finding links, references,
>> infoboxes, categories etc., while creating a wiki article. But it has
>> been designed as a two way tool from the very beginning - what I mean
>> by that is that we could have the users to help improving dbpedia data
>> only in some a very-nonobtrusive way of course.
>> 3) I am interested in your thoughts and remarks in general - you
>> surely have good ideas about what could be done with this agent in the
>> editor!
>> 4) And finally, the most important thing : recently I was asked to
>> write a book chapter about the ways of using dbpedia data in mashups.
>> Naturally it is my task to do the research and compile a good overview
>> on how dbpedia is used in the wild as part of web interfaces. I am
>> also familiar with the many white papers on this topic.
>> But I still wanted to ask from everyone on this list: What are your
>> favorite applications of dbpedia? In your opinion, what should I
>> emphasize?
>>
>> Thanks you!
>>
>> Best Regards
>> Mihály Héder
>> Computer and Automation Research Institute
>> Budapest, Hungary
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> --
> Yury V. Katkov
> WikiVote! llc
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Introducing Gazetiki (geographical database)

Reply via email to