I think the contrib 'Oracle Full Text' does this (although in the
reverse).
It uses Lucene for full text queries (embedded into the db), the
query analyzer works.
It is really a great piece of software. Do bad it can't be done in a
standard way so that it would work with all dbs.
I think it may be possible to embedded the Apache Derby to do
something like this, although this might be overkill. A simple b-tree
db might work best.
It would be interesting if the documents could be stored in a btree,
and a GUID used to access them (since the lucene docid is constantly
changing). The only stored field in a lucene Document would be the
GUID.
On Jan 10, 2007, at 2:21 PM, J. Delgado wrote:
> This is a more general question:
>
> Given the fact that most applications require querying a
combination
> of full-text and structured data has anyone looked into building
data
> structures at the most fundamental level (e.g. combination of b-
tree
> and inverted lists) that would enable scalable and performant
> structured (e.g.SQL or XQuery) + Full-Text queries?
>
> Can Lucene be taken as basis for this or do you recommend exploring
> other routes?
>
> -- Joaquin
>
> 2007/1/10, Chris Hostetter <[EMAIL PROTECTED]>:
>>
>> : So you mean lucene can't do better than this ?
>>
>> robert's point is that based on what you've told us, there is no
>> reason to
>> think Lucene makes sense for you -- if *all* you are doing is
finding
>> documents based on numeric rnages, then a relational database is
>> petter
>> suited to your task. if you accutally care about the tetual IR
>> features
>> of Lucene, then there are probably ways to make your searches
>> faster, but
>> you aren't giving us enough information.
>>
>> you said the example code you gave was in a loop ... but a loop
>> over what?
>> .. what cahnges with each iteration of the loop? ... if there are
>> RangeFilter's that ge reused more then once, CachingWrapperFilter
>> can come
>> in handy to ensure that work isn't done more often then it needs
>> to me.
>>
>> it's also not clear wether your query on "type:0" is just a
>> placeholder,
>> or indicative of what you acctually want to do in the long run ...
>> if all
>> of your queries are this simple, and all you care about is getting
>> a count
>> of things that have type:0 and are in your numeric ranges, then
>> don'g use
>> the "search" method at all, just put "type:0" in your
>> ChainedFilter and
>> call the "bits" method directly.
>>
>> you also haven't given us any information about wether or not
you are
>> opening a new IndexSearcher/IndexReader every time you execute a
>> query, or
>> resuing the same instance -- reuse makes the perofrance much
better
>> because it can reuse underlying resources.
>>
>> In short: if you state some performance numbers from timing some
>> code, and
>> want to know how to make that code faster, you have to actualy
>> show people
>> *all* of the code for them to be able to help you.
>>
>>
>> : >> I still have the search problem I had before, now search
>> takes around
>> : >> 750
>> : >> msecs for a small set of documents.
>> : >>
>> : >> [java] Total Query Processing time (msec) : 38745
>> : >> [java] Total No. of Documents : 7,500,000
>> : >> [java] Total No. of Executed queries : 50.0
>> : >> [java] Execution time per query : 774.9 msec
>> : >>
>> : >> The index is optimized and its size is 830 MB.
>> : >> Each document has the following terms :
>> : >> VSID(integer), data(float), type(short int) , precision
>> (byte).
>> : >> The queries are generate in a loop similar to one below :
>> : >> loop ...
>> : >> RangeFilter rq1 = new
>> : >> RangeFilter
>> ("data","+5.43243243440000","+5.43243243449999"true,true);
>> : >> RangeFilter rq2 = new RangeFilter
>> : >> ("precision","+0001","+0002",true,true);
>> : >> ChainedFilter cf = new ChainedFilter(new
>> : >> Filter[]{rq2,rq1},ChainedFilter.AND);
>> : >> Query query = qp.parse("type:0");
>> : >> Hits hits = searcher.search(query,cf);
>> : >> end loop
>> : >>
>> : >> I would like to know if there exist any solution to improve
>> the search
>> : >> time ? (I need to insert more than 500 million of these data
>> pages into
>> : >> lucene)
>>
>>
>>
>>
>> -Hoss
>>
>>
>>
---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]