Peter Kostelnik, PhD.
Mon, 10 May 2010 04:59:49 -0700
hi, barry ..
thanks a lot for the prompt answer ..
anyway, do you plan to:
- integrate lucene index generator into BigOWLIM's importing mechanism
- support custom analysers and scorers
- define selected FT predicates
?
to the data-type stuff - the question was just if you are using e.g.
NumericField for non-string literals, etc. .. but this is just minor ..
anyway, thanks a lot for usefull info ..
cheers,
Peter K.
>> but, as far as I know, there should be some repository parameter set to
>> enable the full-text, shouldn't it? (repository - when heating up - is
>> deliberating, that Lucene search will be disabled by default .. but all
>> of
>> us know, that lucene-core-3.x is included :) )
>
> Lucene (core) is indeed bundled with OWLIM. However, for new
> installations, Lucene queries will not work until the Lucene index is
> built (along the lines that Naso has already described). For the current
> (preliminary) integration with Lucene, this can only be done on the
> command line and only for 'local' repositories, i.e. it can't be done
> through the Sesame API. It goes like this:
>
> java -DrepositoryPath=<path> com.ontotext.trree.GenerateLuceneIndex
>
> Of course, a classpath parameter is also required that includes the
> BigOWLIM and Lucene jar files.
>
>> (unusually) just a few questions:
>>
>> which predicates are indexed? .. all literals or is there possibility to
>> cut the index only to selected predicates?
>
> For the current implementation, it is all or nothing, i.e. the entire
> repository. Building the index as above will index every literal and the
> local name of every URI.
>
>> is it possible to fire any lucene query (e.g. fuzzy queries "get~0.2 &&
>> me~0.7" etc.)?
>
> The fill Lucene syntax is used:
> http://lucene.apache.org/java/3_0_0/queryparsersyntax.html
>
>> is it possible to plug-in your own analyser?
>
> Not yet.
>
>> is it possible to modify the scoring function?
>
> Not yet.
>
>> are you handling data-types in FTI?
>
> I'm not sure what you mean. There is no special handling of data-types,
> but all literals are indexed.
>
>> I know .. lot of stuff (and event though I've got the feeling that I've
>> forgot something :) )..
>>
>> thanks in advance, have a nice weekend ..
>>
>> cheers,
>> Peter K.
>
> Have a good weekend yourself!
> barry
>
>>> Hi Spyros,
>>>
>>> we should have provided such example. Yes, there is a to perform "RDF
>>> search" from an SPARQL query. Here is the example:
>>>
>>> PREFIX ldsr: <http://www.ontotext.com/>
>>>
>>> SELECT * WHERE {
>>> ?u ldsr:luceneQuery "Amsterdam" ; ldsr:preferredLabel ?l ;
>>> ldsr:hasPageRank ?pr; ldsr:textSnippet ?snip .
>>> } LIMIT 100
>>>
>>> Essentially, it is a matter of using a system predicate, as
>>> demonstrated
>>> above. The query also illustrates the usage of few other system
>>> predicates.
>>>
>>> What luceneQuery predicate does?
>>> During the indexing of the LDSR repository, we do the following: for
>>> each
>>> note, we collect all the strings of its molecule and concatenate them
>>> in a
>>> single piece of text. Than we pass each of these "text molecules" for
>>> indexing to Lucene. We do some simple tricks to "tell" Lucene which is
>>> the
>>> URI and to boost the imporatance of the strings appearing in labels
>>> (instead
>>> of say in comments). We also put the RDF Rank of the nodes as boost
>>> factor.
>>> So, the result is that you get as bidnings for ?u in
>>>
>>> ?u ldsr:luceneQuery "term"
>>>
>>> a serie of URIs, ordered by Lucence's judgement for their relevance to
>>> the
>>> query. This essnetially means standard VSM, with the boosts that I
>>> mentioned
>>> above. This is what we call RDF Search - it allows one to retrieve RDF
>>> nodes
>>> by keywords. One can use the full expressivity of the Lucene query
>>> language.
>>>
>>> We will document all this "stealth" features in the upcoming BigOWLIM
>>> 3.3
>>> release. For instance, there is one alternative, proprietary, FT
>>> indexing
>>> and search method, which implements plain, but very efficient FTS for
>>> literals. Stay tuned!
>>>
>>> Meanwhile, enjoy LDSR :-)
>>>
>>> Naso
>>>
>>> ----------------------------------------------------------
>>> Atanas Kiryakov
>>> Executive Director of Ontotext AD, http://www.ontotext.com
>>> Sirma Group, http://www.sirma.bg
>>> Phone: (+359 2) 974 61 44; Fax: 975 3226
>>> ----------------------------------------------------------
>>> There is no mental process that can change the laws of nature or erase
>>> facts.
>>> The function of consciousness is not to create reality, but to
>>> apprehend
>>> it.
>>> "Existence is Identity, Consciousness is Identification."
>>> Ayn Rand
>>> ----- Original Message -----
>>> From: "Spyros Kotoulas" <k...@few.vu.nl>
>>> To: <owlim-discussion@ontotext.com>
>>> Sent: Friday, May 07, 2010 4:05 PM
>>> Subject: [Owlim-discussion] Keyword search in BigOWLIM
>>>
>>>
>>>> Hi All,
>>>>
>>>> Is there a way to combine keyword search with SPARQL in BigOWLIM?
>>>> For example, in the LDSR repository, there is an RDF search box which
>>>> answers queries fast. If I try regular expressions in SPARQL, the
>>>> performance is very bad. Is there a way to combine the two? For
>>>> example,
>>>> how can I get all the triples that contain URIs with the word
>>>> "Amsterdam"?
>>>>
>>>> -Spyros
>>>> _______________________________________________
>>>> OWLIM-discussion mailing list
>>>> OWLIM-discussion@ontotext.com
>>>> http://ontotext.com/mailman/listinfo/owlim-discussion
>>>
>>> _______________________________________________
>>> OWLIM-discussion mailing list
>>> OWLIM-discussion@ontotext.com
>>> http://ontotext.com/mailman/listinfo/owlim-discussion
>>>
>>
>>
>> _______________________________________________
>> OWLIM-discussion mailing list
>> OWLIM-discussion@ontotext.com
>> http://ontotext.com/mailman/listinfo/owlim-discussion
>
_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion