yeah i am using read-only. i will admit to subclassing queryparser and having customized query/scorer for several. all queries contain fuzzy queries so this was necessary.
"high" throughput i guess is a matter of opinion. in attempting to profile high-throughput, again customized query/scorer made it easy for me to simplify some things, such as some math in termquery that doesn't make sense (redundant) for my Similarity. everything is pretty much i/o bound now so if tehre is some throughput issue i will look into SSD for high volume indexes. i posted on Use Cases on the wiki how I made fuzzy and regex fast if you are curious. On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: > Thanks Robert for sharing. > Good to hear it is working for what you need it to do. > > 3) Especially with ReadOnlyIndexReaders, you should not be blocked while > indexing. Especially if you have multicore machines. > 4) do you stay with sub-second responses with high thru-put? > > -John > > > On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > >> >> >> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >> >>> Nice! >>> Some questions: >>> >>> 1) one index? >>> >> no, but two individual ones today were around 100M docs >> >>> 2) how big is your document? e.g. how many terms etc. >>> >> last one built has over 4M terms >> >>> 3) are you serving(searching) the docs in realtime? >>> >> i dont understand this question, but searching is slower if i am indexing >> on a disk thats also being searched. >> >>> >>> 4) search speed? >>> >> usually subsecond (or close) after some warmup. while this might seem slow >> its fast compared to the competition, trust me. >> >>> >>> I'd love to learn more about your architecture. >>> >> i hate to say you would be disappointed, but theres nothign fancy. >> probably why it works... >> >>> >>> -John >>> >>> >>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> >>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>> out of box jar. >>>> >>>> yeah i have some special subclasses but if i thought any of this stuff >>>> was general enough to be useful to others i'd submit it. I'm just happy to >>>> have something scalable that i can customize to my peculiarities. >>>> >>>> so i think i fit in your 10% and im not stressing on either scalability >>>> or api. >>>> >>>> thanks, >>>> robert >>>> >>>> >>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: >>>> >>>>> Grant: >>>>> I am sorry that I disagree with some points: >>>>> >>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene >>>>> is a great project, especially with 2.x releases, great improvements are >>>>> made, but do we really have a clear picture on how lucene is being used >>>>> and >>>>> deployed. While lucene works great running as a vanilla search library, >>>>> when >>>>> pushed to limits, one needs to "hack" into lucene to make certain things >>>>> work. If 90% of the user base use it to build small indexes and using the >>>>> vanilla api, and the other 10% is really stressing both on the scalability >>>>> and api side and are running into issues, would you still say: "running >>>>> well >>>>> for 90% of the users, therefore it is stable or extensible"? I think it is >>>>> unfair to the project itself to be measured by the vanilla use-case. I >>>>> have >>>>> done couple of large deployments, e.g. >30 million documents indexed and >>>>> searched in realtime., and I really had to do some tweaking. >>>>> >>>>> >>>> >>>> -- >>>> Robert Muir >>>> [EMAIL PROTECTED] >>>> >>> >>> >> >> >> -- >> Robert Muir >> [EMAIL PROTECTED] >> > > -- Robert Muir [EMAIL PROTECTED]