Thanks Robert, definitely interested! We are too, looking into SSDs for performance. 2.4 allows you to create extend QueryParser and create your own "leaf" queries. I am surprised you are mostly IO bound. Lucene does a good job caching. Do you do some sort of caching yourself? If your index is not changing often, there is a lot you can do without SSDs.
-John On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > yeah i am using read-only. > > i will admit to subclassing queryparser and having customized query/scorer > for several. all queries contain fuzzy queries so this was necessary. > > "high" throughput i guess is a matter of opinion. in attempting to profile > high-throughput, again customized query/scorer made it easy for me to > simplify some things, such as some math in termquery that doesn't make sense > (redundant) for my Similarity. everything is pretty much i/o bound now so if > tehre is some throughput issue i will look into SSD for high volume indexes. > > i posted on Use Cases on the wiki how I made fuzzy and regex fast if you > are curious. > > > On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Thanks Robert for sharing. >> Good to hear it is working for what you need it to do. >> >> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while >> indexing. Especially if you have multicore machines. >> 4) do you stay with sub-second responses with high thru-put? >> >> -John >> >> >> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >>> >>> >>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: >>> >>>> Nice! >>>> Some questions: >>>> >>>> 1) one index? >>>> >>> no, but two individual ones today were around 100M docs >>> >>>> 2) how big is your document? e.g. how many terms etc. >>>> >>> last one built has over 4M terms >>> >>>> 3) are you serving(searching) the docs in realtime? >>>> >>> i dont understand this question, but searching is slower if i am indexing >>> on a disk thats also being searched. >>> >>>> >>>> 4) search speed? >>>> >>> usually subsecond (or close) after some warmup. while this might seem >>> slow its fast compared to the competition, trust me. >>> >>>> >>>> I'd love to learn more about your architecture. >>>> >>> i hate to say you would be disappointed, but theres nothign fancy. >>> probably why it works... >>> >>>> >>>> -John >>>> >>>> >>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>> >>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an >>>>> out of box jar. >>>>> >>>>> yeah i have some special subclasses but if i thought any of this stuff >>>>> was general enough to be useful to others i'd submit it. I'm just happy to >>>>> have something scalable that i can customize to my peculiarities. >>>>> >>>>> so i think i fit in your 10% and im not stressing on either scalability >>>>> or api. >>>>> >>>>> thanks, >>>>> robert >>>>> >>>>> >>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Grant: >>>>>> I am sorry that I disagree with some points: >>>>>> >>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene >>>>>> is a great project, especially with 2.x releases, great improvements are >>>>>> made, but do we really have a clear picture on how lucene is being used >>>>>> and >>>>>> deployed. While lucene works great running as a vanilla search library, >>>>>> when >>>>>> pushed to limits, one needs to "hack" into lucene to make certain things >>>>>> work. If 90% of the user base use it to build small indexes and using the >>>>>> vanilla api, and the other 10% is really stressing both on the >>>>>> scalability >>>>>> and api side and are running into issues, would you still say: "running >>>>>> well >>>>>> for 90% of the users, therefore it is stable or extensible"? I think it >>>>>> is >>>>>> unfair to the project itself to be measured by the vanilla use-case. I >>>>>> have >>>>>> done couple of large deployments, e.g. >30 million documents indexed and >>>>>> searched in realtime., and I really had to do some tweaking. >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Robert Muir >>>>> [EMAIL PROTECTED] >>>>> >>>> >>>> >>> >>> >>> -- >>> Robert Muir >>> [EMAIL PROTECTED] >>> >> >> > > > -- > Robert Muir > [EMAIL PROTECTED] >