Support adding a "stored" field via a Reader
--------------------------------------------
Key: LUCENE-1757
URL: https://issues.apache.org/jira/browse/LUCENE-1757
Project: Lucene - Java
Issue Type: Wish
Components: Index
Reporter: Tim Smith
All current constructors for Field() that take a Reader explicitly say they
will not be stored.
It would be highly desirable to support adding a stored field to a Document
using a Reader (or some special interface that can go direct to the source data)
This could greatly reduce memory required for adding very large stored fields
(if used efficiently by IndexWriter)
This will support two primary use cases:
1. can create stored field from arbitrary CharSequence
I may internally use a MutableString type class during document processing to
conserve memory, however, i would currently have to convert this to a String()
prior to adding it as a stored field. If i could just pass a Reader for this
mutable string/char sequence indexing could be smart enough to not require
allocating double the space.
2. can create a stored field from a file on disk
If adding large stored fields, the actual value may be on disk to reduce memory
use during indexing. In order to support using this as a Stored Field, it would
currently have to be entirely loaded into memory as a String/byte[] in order to
be added to a Field() (this could be quite large and provoke OutOfMemory error)
Document retrieval considerations:
It would then also be ideal if when fetching a Document from the index, you
could specify a "max string size" for the returned stored field
if the field was larger than this cutoff, a Reader going directly to disk would
be returned instead of a String/byte[] This would again allow smart
applications to save memory during document retrieval (this would be especially
be nice for highlighting as the source data could be streamed right into the
highlighter)
It would also be acceptable if some new interface would be accepted instead of
Reader
this could be some form of "sized" input stream that will return the number of
bytes/chars that will be produced in total
ex:
{code}
public interface FieldSource {
/** Size of stored field value (in bytes if isBinary() is true, in chars if
isBinary() is false) */
public int size();
/** if true, use getInputStream(), if false, use getReader() */
public boolean isBinary();
/** Get the input stream for pulling this from its source (null if isBinary()
is false) */
public InputStream getInputStream();
/** Get the reader for reading character data (null if isBinary() is true) */
public Reader getReader();
}
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]