owlim-discussion  

Re: [Owlim-discussion] Keyword search in BigOWLIM

Ivan Peikov
Tue, 11 May 2010 00:16:28 -0700

Sure Jem, this is a nice enhancement to the FT-indexing and we'll consider it 
for the next release.

Cheers!
Ivan

On Monday 10 May 2010 15:33:54 Jem Rayfield wrote:
> I would second the requirement to create molecule/pagerank/luceneindex on
> statement insert. Understand it would be performance overhead. However
> certainly a lot nicer than having to shutdown the repository before
> re-indexing the entire repo.
>
> Cheers
> Jem
>
>
>
> On 10/05/2010 12:59, "Peter Kostelnik, PhD." <peter.kostel...@tuke.sk>
>
> wrote:
> > hi, barry ..
> >
> > thanks a lot for the prompt answer ..
> > anyway, do you plan to:
> > - integrate lucene index generator into BigOWLIM's importing mechanism
> > - support custom analysers and scorers
> > - define selected FT predicates
> > ?
> >
> > to the data-type stuff - the question was just if you are using e.g.
> > NumericField for non-string literals, etc. .. but this is just minor ..
> >
> > anyway, thanks a lot for usefull info ..
> >
> > cheers,
> >                   Peter K.
> >
> >>> but, as far as I know, there should be some repository parameter set to
> >>> enable the full-text, shouldn't it? (repository - when heating up - is
> >>> deliberating, that Lucene search will be disabled by default .. but all
> >>> of
> >>> us know, that lucene-core-3.x is included :) )
> >>
> >> Lucene (core) is indeed bundled with OWLIM. However, for new
> >> installations, Lucene queries will not work until the Lucene index is
> >> built (along the lines that Naso has already described). For the current
> >> (preliminary) integration with Lucene, this can only be done on the
> >> command line and only for 'local' repositories, i.e. it can't be done
> >> through the Sesame API. It goes like this:
> >>
> >> java -DrepositoryPath=<path> com.ontotext.trree.GenerateLuceneIndex
> >>
> >> Of course, a classpath parameter is also required that includes the
> >> BigOWLIM and Lucene jar files.
> >>
> >>> (unusually) just a few questions:
> >>>
> >>> which predicates are indexed? .. all literals or is there possibility
> >>> to cut the index only to selected predicates?
> >>
> >> For the current implementation, it is all or nothing, i.e. the entire
> >> repository. Building the index as above will index every literal and the
> >> local name of every URI.
> >>
> >>> is it possible to fire any lucene query (e.g. fuzzy queries "get~0.2 &&
> >>> me~0.7" etc.)?
> >>
> >> The fill Lucene syntax is used:
> >> http://lucene.apache.org/java/3_0_0/queryparsersyntax.html
> >>
> >>> is it possible to plug-in your own analyser?
> >>
> >> Not yet.
> >>
> >>> is it possible to modify the scoring function?
> >>
> >> Not yet.
> >>
> >>> are you handling data-types in FTI?
> >>
> >> I'm not sure what you mean. There is no special handling of data-types,
> >> but all literals are indexed.
> >>
> >>> I know .. lot of stuff (and event though I've got the feeling that I've
> >>> forgot something :) )..
> >>>
> >>> thanks in advance, have a nice weekend ..
> >>>
> >>> cheers,
> >>>                       Peter K.
> >>
> >> Have a good weekend yourself!
> >> barry
> >>
> >>>> Hi Spyros,
> >>>>
> >>>> we should have provided such example. Yes, there is a to perform "RDF
> >>>> search" from an SPARQL query. Here is the example:
> >>>>
> >>>> PREFIX ldsr: <http://www.ontotext.com/>
> >>>>
> >>>> SELECT * WHERE {
> >>>>   ?u ldsr:luceneQuery "Amsterdam" ; ldsr:preferredLabel ?l ;
> >>>> ldsr:hasPageRank ?pr;  ldsr:textSnippet ?snip .
> >>>> } LIMIT 100
> >>>>
> >>>> Essentially, it is a matter of using a system predicate, as
> >>>> demonstrated
> >>>> above. The query also illustrates the usage of few other system
> >>>> predicates.
> >>>>
> >>>> What luceneQuery predicate does?
> >>>> During the indexing of the LDSR repository, we do the following: for
> >>>> each
> >>>> note, we collect all the strings of its molecule and concatenate them
> >>>> in a
> >>>> single piece of text. Than we pass each of these "text molecules" for
> >>>> indexing to Lucene. We do some simple tricks to "tell" Lucene which is
> >>>> the
> >>>> URI and to boost the imporatance of the strings appearing in labels
> >>>> (instead
> >>>> of say in comments). We also put the RDF Rank of the nodes as boost
> >>>> factor.
> >>>> So, the result is that you get as bidnings for ?u in
> >>>>
> >>>> ?u ldsr:luceneQuery "term"
> >>>>
> >>>> a serie of URIs, ordered by Lucence's judgement for their relevance to
> >>>> the
> >>>> query. This essnetially means standard VSM, with the boosts that I
> >>>> mentioned
> >>>> above. This is what we call RDF Search - it allows one to retrieve RDF
> >>>> nodes
> >>>> by keywords. One can use the full expressivity of the Lucene query
> >>>> language.
> >>>>
> >>>> We will document all this "stealth" features in the upcoming BigOWLIM
> >>>> 3.3
> >>>> release. For instance, there is one alternative, proprietary, FT
> >>>> indexing
> >>>> and search method, which implements plain, but very efficient FTS for
> >>>> literals. Stay tuned!
> >>>>
> >>>> Meanwhile, enjoy LDSR :-)
> >>>>
> >>>> Naso
> >>>>
> >>>> ----------------------------------------------------------
> >>>> Atanas Kiryakov
> >>>> Executive Director of Ontotext AD, http://www.ontotext.com
> >>>> Sirma Group, http://www.sirma.bg
> >>>> Phone: (+359 2) 974 61 44; Fax: 975 3226
> >>>> ----------------------------------------------------------
> >>>> There is no mental process that can change the laws of nature or erase
> >>>> facts.
> >>>> The function of consciousness is not to create reality, but to
> >>>> apprehend
> >>>> it.
> >>>> "Existence is Identity, Consciousness is Identification."
> >>>> Ayn Rand
> >>>> ----- Original Message -----
> >>>> From: "Spyros Kotoulas" <k...@few.vu.nl>
> >>>> To: <owlim-discussion@ontotext.com>
> >>>> Sent: Friday, May 07, 2010 4:05 PM
> >>>> Subject: [Owlim-discussion] Keyword search in BigOWLIM
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> Is there a way to combine keyword search with SPARQL in BigOWLIM?
> >>>>> For example, in the LDSR repository, there is an RDF search box which
> >>>>> answers queries fast. If I try regular expressions in SPARQL, the
> >>>>> performance is very bad. Is there a way to combine the two? For
> >>>>> example,
> >>>>> how can I get all the triples that contain URIs with the word
> >>>>> "Amsterdam"?
> >>>>>
> >>>>> -Spyros
> >>>>> _______________________________________________
> >>>>> OWLIM-discussion mailing list
> >>>>> OWLIM-discussion@ontotext.com
> >>>>> http://ontotext.com/mailman/listinfo/owlim-discussion
> >>>>
> >>>> _______________________________________________
> >>>> OWLIM-discussion mailing list
> >>>> OWLIM-discussion@ontotext.com
> >>>> http://ontotext.com/mailman/listinfo/owlim-discussion
> >>>
> >>> _______________________________________________
> >>> OWLIM-discussion mailing list
> >>> OWLIM-discussion@ontotext.com
> >>> http://ontotext.com/mailman/listinfo/owlim-discussion
> >
> > _______________________________________________
> > OWLIM-discussion mailing list
> > OWLIM-discussion@ontotext.com
> > http://ontotext.com/mailman/listinfo/owlim-discussion
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated. If you
> have received it in error, please delete it from your system. Do not use,
> copy or disclose the information in any way nor act in reliance on it and
> notify the sender immediately. Please note that the BBC monitors e-mails
> sent or received.
> Further communication will signify your consent to this.
>
> _______________________________________________
> OWLIM-discussion mailing list
> OWLIM-discussion@ontotext.com
> http://ontotext.com/mailman/listinfo/owlim-discussion


_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion