[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values

Troy Howard (JIRA) Wed, 25 May 2011 17:19:33 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039443#comment-13039443
 ]


Troy Howard commented on LUCENENET-417:
---------------------------------------

Chris's goal here is to prevent large blobs from being placed in memory either 
as binary data or as string data. This is to prevent OOM exceptions on very 
large documents. Using Stream semantics, you can avoid this. 

The limitation of TextReader value types not being stored is due to the 
TextReader type being forward-only, which is based around how Encodings work, 
not due to some kind of fundamental mismatch with Lucene's business rules. 
There is no reason you should not be provide a resettable Stream, and an 
Encoding and perform the same operations, but reset the stream between 
tokenization and value storage stages. 

The only issue would be multi-threading, if tokenization and value storage were 
happening at the same time, they could not operate against the same stream.

> implement streams as field values
> ---------------------------------
>
>                 Key: LUCENENET-417
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-417
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: Lucene.Net Core
>            Reporter: Christopher Currens
>         Attachments: BinaryStream.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole 
> binary data must be loaded into memory and then written to the index.  Adding 
> the ability to use a stream instead of a byte array could not only speed up 
> the indexing process, but reducing the memory footprint as well.
> -Java lucene has the ability to use a TextReader the both analyze and store 
> text in the index.-  Lucene.NET lacks the ability to store string data in the 
> index via streams. This should be a feature added into Lucene .NET as well.  
> My thoughts are to add another Field constructor, that is Field(string name, 
> System.IO.Stream stream, System.Text.Encoding encoding), that will allow the 
> text to be analyzed and stored into the index.
> Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values

Reply via email to