Thanks Robert, definitely interested!
We are too, looking into SSDs for performance.
2.4 allows you to create extend QueryParser and create your own "leaf"
queries.
I am surprised you are mostly IO bound. Lucene does a good job caching. Do
you do some sort of caching yourself? If your index is not changing often,
there is a lot you can do without SSDs.

-John

On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <[EMAIL PROTECTED]> wrote:

> yeah i am using read-only.
>
> i will admit to subclassing queryparser and having customized query/scorer
> for several. all queries contain fuzzy queries so this was necessary.
>
> "high" throughput i guess is a matter of opinion. in attempting to profile
> high-throughput, again customized query/scorer made it easy for me to
> simplify some things, such as some math in termquery that doesn't make sense
> (redundant) for my Similarity. everything is pretty much i/o bound now so if
> tehre is some throughput issue i will look into SSD for high volume indexes.
>
> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you
> are curious.
>
>
> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote:
>
>> Thanks Robert for sharing.
>> Good to hear it is working for what you need it to do.
>>
>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
>> indexing. Especially if you have multicore machines.
>> 4) do you stay with sub-second responses with high thru-put?
>>
>> -John
>>
>>
>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote:
>>>
>>>> Nice!
>>>> Some questions:
>>>>
>>>> 1) one index?
>>>>
>>> no, but two individual ones today were around 100M docs
>>>
>>>> 2) how big is your document? e.g. how many terms etc.
>>>>
>>> last one built has over 4M terms
>>>
>>>> 3) are you serving(searching) the docs in realtime?
>>>>
>>> i dont understand this question, but searching is slower if i am indexing
>>> on a disk thats also being searched.
>>>
>>>>
>>>> 4) search speed?
>>>>
>>> usually subsecond (or close) after some warmup. while this might seem
>>> slow its fast compared to the competition, trust me.
>>>
>>>>
>>>> I'd love to learn more about your architecture.
>>>>
>>> i hate to say you would be disappointed, but theres nothign fancy.
>>> probably why it works...
>>>
>>>>
>>>> -John
>>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>>> out of box jar.
>>>>>
>>>>> yeah i have some special subclasses but if i thought any of this stuff
>>>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>>>> have something scalable that i can customize to my peculiarities.
>>>>>
>>>>> so i think i fit in your 10% and im not stressing on either scalability
>>>>> or api.
>>>>>
>>>>> thanks,
>>>>> robert
>>>>>
>>>>>
>>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Grant:
>>>>>>         I am sorry that I disagree with some points:
>>>>>>
>>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene
>>>>>> is a great project, especially with 2.x releases, great improvements are
>>>>>> made, but do we really have a clear picture on how lucene is being used 
>>>>>> and
>>>>>> deployed. While lucene works great running as a vanilla search library, 
>>>>>> when
>>>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>>>> work. If 90% of the user base use it to build small indexes and using the
>>>>>> vanilla api, and the other 10% is really stressing both on the 
>>>>>> scalability
>>>>>> and api side and are running into issues, would you still say: "running 
>>>>>> well
>>>>>> for 90% of the users, therefore it is stable or extensible"? I think it 
>>>>>> is
>>>>>> unfair to the project itself to be measured by the vanilla use-case. I 
>>>>>> have
>>>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>>>> searched in realtime., and I really had to do some tweaking.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> [EMAIL PROTECTED]
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> [EMAIL PROTECTED]
>>>
>>
>>
>
>
> --
> Robert Muir
> [EMAIL PROTECTED]
>

Reply via email to