This sounds very interesting... I'll defenitely have a look into it. However I have the feeling that, like the use of Oracle Text, this is keeping seperate the underlying data structures used for evaluating full-text and conditions over other data types, which brings up other issues when trying to do full-blown mixed queries. Things get worse when doing joins and other relational algebra operations.
I'm still wondering if the basic data structures should be revised to achieve better performance... -- Joaquin 2007/1/10, robert engels <[EMAIL PROTECTED]>:
There is a module in Lucene contrib that changes that! It loads Lucene into the Oracle database (it has a JVM), and allows Lucene syntax to perform full-text searching. On Jan 10, 2007, at 2:37 PM, J. Delgado wrote: > No, Oracle Text does not use Lucene. It has its own proprietary > full-text engine. It represents documents, the inverted index and > relationships in a DB schema and it depends heavily on the SQL layer. > This has some severe limitations though... > > Of course, you can push structured data into full-text based indexes. > We have seen how in Lucene we can represent some structured data types > (e.g. dates, numbers) as fields and perform some type of mixed queries > but the Lucene index, as some of you have pointed out, is not meant > for this and does not scale like a DB would. > > I'm looking to hear new ideas people may have to solve this very > hard problem. > > -- Joaquin > > 2007/1/10, robert engels <[EMAIL PROTECTED]>: >> I think the contrib 'Oracle Full Text' does this (although in the >> reverse). >> >> It uses Lucene for full text queries (embedded into the db), the >> query analyzer works. >> >> It is really a great piece of software. Do bad it can't be done in a >> standard way so that it would work with all dbs. >> >> I think it may be possible to embedded the Apache Derby to do >> something like this, although this might be overkill. A simple b-tree >> db might work best. >> >> It would be interesting if the documents could be stored in a btree, >> and a GUID used to access them (since the lucene docid is constantly >> changing). The only stored field in a lucene Document would be the >> GUID. >> >> On Jan 10, 2007, at 2:21 PM, J. Delgado wrote: >> >> > This is a more general question: >> > >> > Given the fact that most applications require querying a >> combination >> > of full-text and structured data has anyone looked into building >> data >> > structures at the most fundamental level (e.g. combination of b- >> tree >> > and inverted lists) that would enable scalable and performant >> > structured (e.g.SQL or XQuery) + Full-Text queries? >> > >> > Can Lucene be taken as basis for this or do you recommend exploring >> > other routes? >> > >> > -- Joaquin >> > >> > 2007/1/10, Chris Hostetter <[EMAIL PROTECTED]>: >> >> >> >> : So you mean lucene can't do better than this ? >> >> >> >> robert's point is that based on what you've told us, there is no >> >> reason to >> >> think Lucene makes sense for you -- if *all* you are doing is >> finding >> >> documents based on numeric rnages, then a relational database is >> >> petter >> >> suited to your task. if you accutally care about the tetual IR >> >> features >> >> of Lucene, then there are probably ways to make your searches >> >> faster, but >> >> you aren't giving us enough information. >> >> >> >> you said the example code you gave was in a loop ... but a loop >> >> over what? >> >> .. what cahnges with each iteration of the loop? ... if there are >> >> RangeFilter's that ge reused more then once, CachingWrapperFilter >> >> can come >> >> in handy to ensure that work isn't done more often then it needs >> >> to me. >> >> >> >> it's also not clear wether your query on "type:0" is just a >> >> placeholder, >> >> or indicative of what you acctually want to do in the long run ... >> >> if all >> >> of your queries are this simple, and all you care about is getting >> >> a count >> >> of things that have type:0 and are in your numeric ranges, then >> >> don'g use >> >> the "search" method at all, just put "type:0" in your >> >> ChainedFilter and >> >> call the "bits" method directly. >> >> >> >> you also haven't given us any information about wether or not >> you are >> >> opening a new IndexSearcher/IndexReader every time you execute a >> >> query, or >> >> resuing the same instance -- reuse makes the perofrance much >> better >> >> because it can reuse underlying resources. >> >> >> >> In short: if you state some performance numbers from timing some >> >> code, and >> >> want to know how to make that code faster, you have to actualy >> >> show people >> >> *all* of the code for them to be able to help you. >> >> >> >> >> >> : >> I still have the search problem I had before, now search >> >> takes around >> >> : >> 750 >> >> : >> msecs for a small set of documents. >> >> : >> >> >> : >> [java] Total Query Processing time (msec) : 38745 >> >> : >> [java] Total No. of Documents : 7,500,000 >> >> : >> [java] Total No. of Executed queries : 50.0 >> >> : >> [java] Execution time per query : 774.9 msec >> >> : >> >> >> : >> The index is optimized and its size is 830 MB. >> >> : >> Each document has the following terms : >> >> : >> VSID(integer), data(float), type(short int) , precision >> >> (byte). >> >> : >> The queries are generate in a loop similar to one below : >> >> : >> loop ... >> >> : >> RangeFilter rq1 = new >> >> : >> RangeFilter >> >> ("data","+5.43243243440000","+5.43243243449999"true,true); >> >> : >> RangeFilter rq2 = new RangeFilter >> >> : >> ("precision","+0001","+0002",true,true); >> >> : >> ChainedFilter cf = new ChainedFilter(new >> >> : >> Filter[]{rq2,rq1},ChainedFilter.AND); >> >> : >> Query query = qp.parse("type:0"); >> >> : >> Hits hits = searcher.search(query,cf); >> >> : >> end loop >> >> : >> >> >> : >> I would like to know if there exist any solution to improve >> >> the search >> >> : >> time ? (I need to insert more than 500 million of these data >> >> pages into >> >> : >> lucene) >> >> >> >> >> >> >> >> >> >> -Hoss >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> > >> > >> --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]