Ivan Peikov
Tue, 11 May 2010 00:16:28 -0700
Sure Jem, this is a nice enhancement to the FT-indexing and we'll consider it for the next release. Cheers! Ivan On Monday 10 May 2010 15:33:54 Jem Rayfield wrote: > I would second the requirement to create molecule/pagerank/luceneindex on > statement insert. Understand it would be performance overhead. However > certainly a lot nicer than having to shutdown the repository before > re-indexing the entire repo. > > Cheers > Jem > > > > On 10/05/2010 12:59, "Peter Kostelnik, PhD." <peter.kostel...@tuke.sk> > > wrote: > > hi, barry .. > > > > thanks a lot for the prompt answer .. > > anyway, do you plan to: > > - integrate lucene index generator into BigOWLIM's importing mechanism > > - support custom analysers and scorers > > - define selected FT predicates > > ? > > > > to the data-type stuff - the question was just if you are using e.g. > > NumericField for non-string literals, etc. .. but this is just minor .. > > > > anyway, thanks a lot for usefull info .. > > > > cheers, > > Peter K. > > > >>> but, as far as I know, there should be some repository parameter set to > >>> enable the full-text, shouldn't it? (repository - when heating up - is > >>> deliberating, that Lucene search will be disabled by default .. but all > >>> of > >>> us know, that lucene-core-3.x is included :) ) > >> > >> Lucene (core) is indeed bundled with OWLIM. However, for new > >> installations, Lucene queries will not work until the Lucene index is > >> built (along the lines that Naso has already described). For the current > >> (preliminary) integration with Lucene, this can only be done on the > >> command line and only for 'local' repositories, i.e. it can't be done > >> through the Sesame API. It goes like this: > >> > >> java -DrepositoryPath=<path> com.ontotext.trree.GenerateLuceneIndex > >> > >> Of course, a classpath parameter is also required that includes the > >> BigOWLIM and Lucene jar files. > >> > >>> (unusually) just a few questions: > >>> > >>> which predicates are indexed? .. all literals or is there possibility > >>> to cut the index only to selected predicates? > >> > >> For the current implementation, it is all or nothing, i.e. the entire > >> repository. Building the index as above will index every literal and the > >> local name of every URI. > >> > >>> is it possible to fire any lucene query (e.g. fuzzy queries "get~0.2 && > >>> me~0.7" etc.)? > >> > >> The fill Lucene syntax is used: > >> http://lucene.apache.org/java/3_0_0/queryparsersyntax.html > >> > >>> is it possible to plug-in your own analyser? > >> > >> Not yet. > >> > >>> is it possible to modify the scoring function? > >> > >> Not yet. > >> > >>> are you handling data-types in FTI? > >> > >> I'm not sure what you mean. There is no special handling of data-types, > >> but all literals are indexed. > >> > >>> I know .. lot of stuff (and event though I've got the feeling that I've > >>> forgot something :) ).. > >>> > >>> thanks in advance, have a nice weekend .. > >>> > >>> cheers, > >>> Peter K. > >> > >> Have a good weekend yourself! > >> barry > >> > >>>> Hi Spyros, > >>>> > >>>> we should have provided such example. Yes, there is a to perform "RDF > >>>> search" from an SPARQL query. Here is the example: > >>>> > >>>> PREFIX ldsr: <http://www.ontotext.com/> > >>>> > >>>> SELECT * WHERE { > >>>> ?u ldsr:luceneQuery "Amsterdam" ; ldsr:preferredLabel ?l ; > >>>> ldsr:hasPageRank ?pr; ldsr:textSnippet ?snip . > >>>> } LIMIT 100 > >>>> > >>>> Essentially, it is a matter of using a system predicate, as > >>>> demonstrated > >>>> above. The query also illustrates the usage of few other system > >>>> predicates. > >>>> > >>>> What luceneQuery predicate does? > >>>> During the indexing of the LDSR repository, we do the following: for > >>>> each > >>>> note, we collect all the strings of its molecule and concatenate them > >>>> in a > >>>> single piece of text. Than we pass each of these "text molecules" for > >>>> indexing to Lucene. We do some simple tricks to "tell" Lucene which is > >>>> the > >>>> URI and to boost the imporatance of the strings appearing in labels > >>>> (instead > >>>> of say in comments). We also put the RDF Rank of the nodes as boost > >>>> factor. > >>>> So, the result is that you get as bidnings for ?u in > >>>> > >>>> ?u ldsr:luceneQuery "term" > >>>> > >>>> a serie of URIs, ordered by Lucence's judgement for their relevance to > >>>> the > >>>> query. This essnetially means standard VSM, with the boosts that I > >>>> mentioned > >>>> above. This is what we call RDF Search - it allows one to retrieve RDF > >>>> nodes > >>>> by keywords. One can use the full expressivity of the Lucene query > >>>> language. > >>>> > >>>> We will document all this "stealth" features in the upcoming BigOWLIM > >>>> 3.3 > >>>> release. For instance, there is one alternative, proprietary, FT > >>>> indexing > >>>> and search method, which implements plain, but very efficient FTS for > >>>> literals. Stay tuned! > >>>> > >>>> Meanwhile, enjoy LDSR :-) > >>>> > >>>> Naso > >>>> > >>>> ---------------------------------------------------------- > >>>> Atanas Kiryakov > >>>> Executive Director of Ontotext AD, http://www.ontotext.com > >>>> Sirma Group, http://www.sirma.bg > >>>> Phone: (+359 2) 974 61 44; Fax: 975 3226 > >>>> ---------------------------------------------------------- > >>>> There is no mental process that can change the laws of nature or erase > >>>> facts. > >>>> The function of consciousness is not to create reality, but to > >>>> apprehend > >>>> it. > >>>> "Existence is Identity, Consciousness is Identification." > >>>> Ayn Rand > >>>> ----- Original Message ----- > >>>> From: "Spyros Kotoulas" <k...@few.vu.nl> > >>>> To: <owlim-discussion@ontotext.com> > >>>> Sent: Friday, May 07, 2010 4:05 PM > >>>> Subject: [Owlim-discussion] Keyword search in BigOWLIM > >>>> > >>>>> Hi All, > >>>>> > >>>>> Is there a way to combine keyword search with SPARQL in BigOWLIM? > >>>>> For example, in the LDSR repository, there is an RDF search box which > >>>>> answers queries fast. If I try regular expressions in SPARQL, the > >>>>> performance is very bad. Is there a way to combine the two? For > >>>>> example, > >>>>> how can I get all the triples that contain URIs with the word > >>>>> "Amsterdam"? > >>>>> > >>>>> -Spyros > >>>>> _______________________________________________ > >>>>> OWLIM-discussion mailing list > >>>>> OWLIM-discussion@ontotext.com > >>>>> http://ontotext.com/mailman/listinfo/owlim-discussion > >>>> > >>>> _______________________________________________ > >>>> OWLIM-discussion mailing list > >>>> OWLIM-discussion@ontotext.com > >>>> http://ontotext.com/mailman/listinfo/owlim-discussion > >>> > >>> _______________________________________________ > >>> OWLIM-discussion mailing list > >>> OWLIM-discussion@ontotext.com > >>> http://ontotext.com/mailman/listinfo/owlim-discussion > > > > _______________________________________________ > > OWLIM-discussion mailing list > > OWLIM-discussion@ontotext.com > > http://ontotext.com/mailman/listinfo/owlim-discussion > > http://www.bbc.co.uk/ > This e-mail (and any attachments) is confidential and may contain personal > views which are not the views of the BBC unless specifically stated. If you > have received it in error, please delete it from your system. Do not use, > copy or disclose the information in any way nor act in reliance on it and > notify the sender immediately. Please note that the BBC monitors e-mails > sent or received. > Further communication will signify your consent to this. > > _______________________________________________ > OWLIM-discussion mailing list > OWLIM-discussion@ontotext.com > http://ontotext.com/mailman/listinfo/owlim-discussion _______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion