[ 
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049856#comment-13049856
 ] 

Christopher Currens commented on LUCENENET-417:
-----------------------------------------------

That's a valid question.  I think it's mostly common (but not limited to) when 
Lucene is used to index file systems.  As an example, extracted text out of 
some xls files can be *shudder* in the hundreds of mb.  When accuracy is needed 
in a search, the MaxFieldLength.Unlimited becomes important, as we don't want 
silent truncation of search terms.  The idea of streaming it, as I said before, 
was more for handling _program memory_, especially when multiple indexes are 
read/written at the same time, rather than the ability to index a large file.  
Granted, there are other ways to solve the problem, like what you sort of 
suggested, breaking up a larger file into smaller chunks.  However, not all 
data is divisible like a book would be, so it's not an ideal solution, 
especially if you're storing file metadata along with full text.

> implement streams as field values
> ---------------------------------
>
>                 Key: LUCENENET-417
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-417
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: Lucene.Net Core
>            Reporter: Christopher Currens
>         Attachments: StreamValues.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole 
> binary data must be loaded into memory and then written to the index.  Adding 
> the ability to use a stream instead of a byte array could not only speed up 
> the indexing process, but reducing the memory footprint as well.
> -Java lucene has the ability to use a TextReader the both analyze and store 
> text in the index.-  Lucene.NET lacks the ability to store string data in the 
> index via streams. This should be a feature added into Lucene .NET as well.  
> My thoughts are to add another Field constructor, that is Field(string name, 
> System.IO.Stream stream, System.Text.Encoding encoding), that will allow the 
> text to be analyzed and stored into the index.
> Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to