yeah i am using read-only.

i will admit to subclassing queryparser and having customized query/scorer
for several. all queries contain fuzzy queries so this was necessary.

"high" throughput i guess is a matter of opinion. in attempting to profile
high-throughput, again customized query/scorer made it easy for me to
simplify some things, such as some math in termquery that doesn't make sense
(redundant) for my Similarity. everything is pretty much i/o bound now so if
tehre is some throughput issue i will look into SSD for high volume indexes.

i posted on Use Cases on the wiki how I made fuzzy and regex fast if you are
curious.

On Thu, Dec 4, 2008 at 2:10 AM, John Wang <[EMAIL PROTECTED]> wrote:

> Thanks Robert for sharing.
> Good to hear it is working for what you need it to do.
>
> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
> indexing. Especially if you have multicore machines.
> 4) do you stay with sub-second responses with high thru-put?
>
> -John
>
>
> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote:
>
>>
>>
>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote:
>>
>>> Nice!
>>> Some questions:
>>>
>>> 1) one index?
>>>
>> no, but two individual ones today were around 100M docs
>>
>>> 2) how big is your document? e.g. how many terms etc.
>>>
>> last one built has over 4M terms
>>
>>> 3) are you serving(searching) the docs in realtime?
>>>
>> i dont understand this question, but searching is slower if i am indexing
>> on a disk thats also being searched.
>>
>>>
>>> 4) search speed?
>>>
>> usually subsecond (or close) after some warmup. while this might seem slow
>> its fast compared to the competition, trust me.
>>
>>>
>>> I'd love to learn more about your architecture.
>>>
>> i hate to say you would be disappointed, but theres nothign fancy.
>> probably why it works...
>>
>>>
>>> -John
>>>
>>>
>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <[EMAIL PROTECTED]> wrote:
>>>
>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>> out of box jar.
>>>>
>>>> yeah i have some special subclasses but if i thought any of this stuff
>>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>>> have something scalable that i can customize to my peculiarities.
>>>>
>>>> so i think i fit in your 10% and im not stressing on either scalability
>>>> or api.
>>>>
>>>> thanks,
>>>> robert
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Grant:
>>>>>         I am sorry that I disagree with some points:
>>>>>
>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene
>>>>> is a great project, especially with 2.x releases, great improvements are
>>>>> made, but do we really have a clear picture on how lucene is being used 
>>>>> and
>>>>> deployed. While lucene works great running as a vanilla search library, 
>>>>> when
>>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>>> work. If 90% of the user base use it to build small indexes and using the
>>>>> vanilla api, and the other 10% is really stressing both on the scalability
>>>>> and api side and are running into issues, would you still say: "running 
>>>>> well
>>>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>>>> unfair to the project itself to be measured by the vanilla use-case. I 
>>>>> have
>>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>>> searched in realtime., and I really had to do some tweaking.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> [EMAIL PROTECTED]
>>>>
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> [EMAIL PROTECTED]
>>
>
>


-- 
Robert Muir
[EMAIL PROTECTED]

Reply via email to