Support adding a "stored" field via a Reader
--------------------------------------------

                 Key: LUCENE-1757
                 URL: https://issues.apache.org/jira/browse/LUCENE-1757
             Project: Lucene - Java
          Issue Type: Wish
          Components: Index
            Reporter: Tim Smith


All current constructors for Field() that take a Reader explicitly say they 
will not be stored.

It would be highly desirable to support adding a stored field to a Document 
using a Reader (or some special interface that can go direct to the source data)

This could greatly reduce memory required for adding very large stored fields 
(if used efficiently by IndexWriter)

This will support two primary use cases:

1. can create stored field from arbitrary CharSequence 

I may internally use a MutableString type class during document processing to 
conserve memory, however, i would currently have to convert this to a String() 
prior to adding it as a stored field. If i could just pass a Reader for this 
mutable string/char sequence indexing could be smart enough to not require 
allocating double the space.

2. can create a stored field from a file on disk

If adding large stored fields, the actual value may be on disk to reduce memory 
use during indexing. In order to support using this as a Stored Field, it would 
currently have to be entirely loaded into memory as a String/byte[] in order to 
be added to a Field() (this could be quite large and provoke OutOfMemory error)


Document retrieval considerations:

It would then also be ideal if when fetching a Document from the index, you 
could specify a "max string size" for the returned stored field
if the field was larger than this cutoff, a Reader going directly to disk would 
be returned instead of a String/byte[]  This would again allow smart 
applications to save memory during document retrieval (this would be especially 
be nice for highlighting as the source data could be streamed right into the 
highlighter)


It would also be acceptable if some new interface would be accepted instead of 
Reader
this could be some form of "sized" input stream that will return the number of 
bytes/chars that will be produced in total
ex:
{code}
public interface FieldSource {
  /** Size of stored field value (in bytes if isBinary() is true, in chars if 
isBinary() is false) */
  public int size();

  /** if true, use getInputStream(), if false, use getReader() */
  public boolean isBinary();

  /** Get the input stream for pulling this from its source (null if isBinary() 
is false) */
  public InputStream getInputStream();

  /** Get the reader for reading character data (null if isBinary() is true) */
  public Reader getReader();
}
{code}




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to