[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

Jake Mannix (JIRA) Tue, 10 Nov 2009 23:08:12 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776329#action_12776329
 ]


Jake Mannix commented on LUCENE-1526:
-------------------------------------

bq. Whoa, pretty insane volume. 

Aiming for maxing out indexing speed and query throughput at the same time is 
what we're testing here, and this is a reasonable extreme limit to aim for when 
stress-testing real-time search.

bq. A handful by pooling the BitVector fixed size bytes arrays (see 
LUCENE-1574).

Pooling, you say?  But what if updates come in too fast to reuse your pool?  If 
you're indexing at the speeds I'm describing, won't you run out of BitVectors 
in the pool?

bq. I really need a solution that will absolutely not affect query performance 
from what is today

"You" really need this?  Why is the core case for real-time search a scenario 
where taking a hit of a huge reduction in throughput worth a possible gain in 
query latency?  If the cost was 20% query latency drop in exchange for 7x 
throughput cost when doing heavy indexing, is that worth it?  What about 10% 
latency cost vs 2x throughput loss?  These questions aren't easily answered by 
saying real-time search with Lucene needs  to _absolutely not affect query 
performance from what it is today_.  These kinds of absolute statements should 
be backed up by comparisons with real performance and load testing.

There are many axes of performance to optimize for: 
* query latency
* query throughput
* indexing throughput
* index freshness (how fast before documents are visible)

Saying that one of these is absolutely of more importance than the others 
without real metrics showing which ones are affected in which ways by different 
implementation choices is doing a disservice to the community, and is not by 
any means "conservative".

> For near real-time search, use paged copy-on-write BitVector impl
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1526
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1526
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

Reply via email to