Re: Loading DBpedia datasets

Paolo Castagna Fri, 06 Apr 2012 09:01:05 -0700

Hi Prerak,
interesting, good.

If you put a demo or share your code somewhere in the future, let us know.


Paolo

prerak pradhan wrote:
> Hey Paolo, 
> Sry for the late reply got quite busy here. I have been using the stanford 
> NLP library along with Stanford NER and their dependency parser. To map user 
> queries into SPARQL queries, the application am working on basically 
> recognizes proper nouns or entities in the user input and tries to use the 
> dependency parser to guess what property/relationship of that entity is the 
> user trying to search for.
> 
>  For example if the user puts in the line: 
> "Managers of Manchester United with their active date"
> The application would recognize Manchester United as a proper noun or an 
> entity and since the keyword "Manager" has a direct relationship here i.e the 
> dependency "prep_of", it would try and find their Managers with the first 
> query and than on the second run would try and add in their active date.
> The algorithm,for this is still very much under development and am currently 
> testing it for more complex queries. So lot of hair pulling and head banging 
> still to go ;)
> 
> Prerak
> 
> 
> 
> ________________________________
>  From: Paolo Castagna <castagna.li...@googlemail.com>
> To: jena-users@incubator.apache.org 
> Sent: Friday, April 6, 2012 1:30 AM
> Subject: Re: Loading DBpedia datasets
>  
> prerak pradhan wrote:
>> Thanks for the input, Paolo and yes I did use tdbloader2 on ubuntu and than 
>> transfered the directory into windows. Loading seems a lot faster in linux 
>> using tdbloader2. 
> 
> Yep.
> 
>> I am working on this mini-project which aims to develop that performs 
>> semantic search on DBpedia data, something like Kngine or Hakia on a very 
>> very small scale, I think I have got the natural language processing part 
>> done and am now trying to work on forming SPARQL queries to be run against 
>> Dbpedia dataset based on NLP output on the user entered query. Do you have 
>> any links or resource in this regard? thanks again appreciate it.
> 
> Oh, well... another "semantic search" project. :-)
> 
> I do not have more links that you would find from Wikipedia page
> on "semantic search" and the references there.
> 
> By the way, Which NLP library are you using?
> How do you generate a SPARQL query from natural language?
> 
> In a distant past, I tried to exploit the prefix:keyword pattern
> that many search engines support, and therefore people might already
> used to, to do something vaguely similar but much (much) simpler
> thing (no NLP involved):
> 
>   type:car color:blue model:touran city:london ...
> 
> It works for very simple queries, but it quickly becomes impractical.
> It would, however, be an improvement for certain type of searches.
> Even supporting just type:{book|person|city|...} can be useful.
> 
> Paolo
> 
>>
>> ________________________________
>>   From: Paolo Castagna <castagna.li...@googlemail.com>
>> To: jena-users@incubator.apache.org 
>> Sent: Thursday, April 5, 2012 8:36 AM
>> Subject: Re: Loading DBpedia datasets
>>   
>> prerak pradhan wrote:
>>> Hello, there am just starting off with Jena and am pretty new to it. I am 
>>> trying to load all the DBpedia datasets so that I can have a local version 
>>> of DBpedia working on my station here. I used the TDB loader to load the 
>>> data sets while doing so I specified a directory on which to load the 
>>> dataset.  I used the following code to query the dataset.
>>>     String directory = "c:/dataset" ;
>>>     DatasetGraphTDB dataset = TDBFactory.createDatasetGraph(directory);
>>>     Graph g1 = dataset.getDefaultGraph();
>>>     Model newModel = ModelFactory.createModelForGraph(g1);
>>>     String q= "SELECT ?p ?o WHERE { 
>>> <http://dbpedia.org/resource/Mendelian_inheritance> ?p ?o . }";
>>>     Query query = QueryFactory.create(q);
>>>     QueryExecution qexec = QueryExecutionFactory.create(query,newModel);
>>>     ResultSet results = qexec.execSelect();
>>>     while (results.hasNext()) {
>>>       QuerySolution result = results.nextSolution();
>>>       RDFNode s = result.get("s");
>>>       RDFNode p = result.get("p");
>>>       RDFNode o = result.get("o");
>>>       System.out.println( " { " + s + " " + p + " " + o + " . }");
>>> }
>>> Now my question is, the DBpedia data dumps come in various files, do I load 
>>> all these files in the same directory using TDB to create one huge model or 
>>> do I need to load it into different directories thus having to create 
>>> different models to query the data. Please not that I do not plan to load 
>>> the whole of DBpedia datasets onto the datastore just the english version 
>>> of Ontology Infobox properties, Titles and Ontology Infobox types. Forgive 
>>> me for my very amateur question but i am just getting started with it ;). 
>> Hi Prerak,
>> first of all, welcome on the Jena mailing list.
>>
>> DBPedia is one of the "not so small" RDF data dumps around, so it's better 
>> you
>> check you are using a 64 bits OS and JVM and you have a decent amount of RAM 
>> on
>> that machine. A few more details here:
>> http://incubator.apache.org/jena/documentation/tdb/jvm_64_32.html
>>
>> Then, allow me to suggest you to read about 'RDF dataset' here:
>> http://www.w3.org/TR/sparql11-query/#rdfDataset
>>
>> TDB supports RDF datasets, documentation is here:
>> http://incubator.apache.org/jena/documentation/tdb/datasets.html
>>
>> So, you can load the entire DBPedia data into a single TDB location on disk
>> (i.e. a single directory). This way, you can run SPARQL queries over it.
>> This is in my opinion the best option.
>>
>> You could use named graphs, read more about N-Quads serialization format 
>> here:
>> http://sw.deri.org/2008/07/n-quads/
>>
>> And, in relation to DBPedia, here:
>> http://wiki.dbpedia.org/Datasets#h18-18
>>
>> You might decide to create your own named graphs and load parts of DBPedia 
>> in it
>> to support your data management needs, rather than taking the named graphs
>> given to you by DBPedia to track provenance.
>>
>> Finally, with datasets of the size of DBPedia, tdbloader2 should be a better
>> choice than tdbloader, but you seems to be using Windows therefore 
>> tdbloader2 is
>> not a good choice for you. You could have a look at tdbloader3 as well, if 
>> you
>> have problems with tdbloader. But, try first with tdbloader. You can also 
>> load
>> the data of a server with a decent amount of RAM and move the files around as
>> you need.
>>
>> What are you planning to do with DBPedia loaded locally?
>>
>> I hope this helps and let me know how it goes,
>> Paolo

Re: Loading DBpedia datasets

Reply via email to