[jira] Updated: (LUCENE-1313) Realtime Search

Jason Rutherglen (JIRA) Fri, 01 May 2009 11:39:52 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Rutherglen updated LUCENE-1313:
-------------------------------------

    Attachment: LUCENE-1313.patch

* IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found). 

* FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as the
LUCENE-1618 patch is only a couple of files).

* Removed DocumentsWriter.ramOverLimit

* I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.

* We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?

* Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?

{quote}If NRT is never used, the behavior of IW should be
unchanged (which is not the case w/ this patch I think). RAMDir
should be created the first time a flush is done due to NRT
creation. {quote}

In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor? What if the user has an
alternative ramdir implementation they want to use?

{quote}StoredFieldsWriter & TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the
opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
we should change getFlushDiretory to NOT return the
ramdir?){quote}

The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?

{quote}Why did you need to add synchronized to some of the
SegmentInfo files methods? (What breaks if you undo that?). The
contract here is IW protects access to SegmentInfo/s{quote}

SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?

{quote}The MergePolicy needs some smarts when it's dealing w/
RAM. EG it should not do a merge of more than XXX% of total RAM
usage (should flush to the real directory instead){quote}

Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?

> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory 
> enables replication.  It is difficult to replicate using other methods 
> because while the document may easily be serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and 
> searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1313) Realtime Search

Reply via email to