Re: Fieldable, AbstractField, Field

Michael McCandless Wed, 19 Mar 2008 12:50:49 -0700


Chris Hostetter wrote:

: I do like moving towards a separation of Document for indexing vs
: searching for 3.0.
:
: Disregarding for starters how we get there from here...
:
: Wouldn't we just want a base class (not an interface), say
: ReadOnlyField, that is used in documents retrieved by a reader?This
: class would also have Index.*, Store.*, TermVector.*, and
: isStored/Indexed/Tokenized/Compressed, etc, as these are recoverable
: from an index.  Couldn't this be a concrete class, ie, the actual
: class instantiated when a Document is loaded from a reader?

Yes, but one of the peeves I've heard lots of people express over
the years is that they want want to "decorate" the Documentsreturned by a
search, so that they can make those documents access alternate field
stores and metadata not in the index.  (LUCENE-778 started out being a
dicussion of wanting to pass custom subclasses of Document to
writer.addDocument(), but it also mentions wanting to get customdocuments
back from IndexReader.

Imagine you're writing an app that does a search with Lucene, and then
returns a List<Document> ...

  public List<Document> myMethod(options) {
Document<List> docs = doSomeSearchStuff(indexreader, query,options)
    return docs;
  }
you've got alot of downstream code that calls myMethod and uses/propogatesthis List<Document> ... and then one day you decide that for eachdocumentyou want to also include some metadata that Lucene doesn't knowanything
about, your downstream client code is happy to treat this new metadata
just like any other field. You could change the API of myMethodand jumpthrough a lot of hoops changing all of your other code; or if"Document"
is a simple interface, you could do something like...

  public class MyDocumentWraper implements Document {
    public MyDocumentWraper(Document, otherData) {...}
public static List<Document> wrappList(List<Document>,otherData) {...}
  }
  public List<Document> myMethod(options) {
Document<List> docs = doSomeSearchStuff(indexreader, query,options)
    return MyDocumentWraper.wrapList(docs, getOtherData(options));
  }
(If i remember right, there are some comments to this effect inLUCENE-778
as well)

Wouldn't subclassing ReadOnlyDocument also work in this case, if youoverride the getField* to do your own new logic if it applies elsefallback to super?

Alternatively .... we back away from distinguishing read only vsindex time Document (and go back to a single concrete Field class).This way you can alter the fields of a Document returned from areader. I agree it's not clear that forcing "read only" on aDocument returned by a reader is the right approach. People who arecareful (store enough fields, don't use boosting or have separatestore for their boosting) could pull Documents from a reader, tweakthem, and build a new index.

: And then a subclass, IndexableField, that adds reader & tokenStream
: values, get/set boost, setters to change a field's value, etc.

IndexableField really shouldn't be a subclass of whatever class is
returned after a sarch is done ... the methods used for accessing the
"stored" value of a returned document make as little sense in the

context of IndexableField as the setBoost/Reader/TokenStreamfunctions of

Document currently make when a search is executed.

when all is said and done: an IndexableField and a SearchResultField
shouldn't have anything in common except *maybe* that they both have a
fieldName.

Actually I think they do share alot more than just name of thefield? Accessing the "stored" value of a document is exactly whatindexing needs to do when it indexes the document in the firstplace? Ie, a "stored" document "looks alot like" the document atindexing time that had been stored. And things like isTokenized,isTermVectorStored, isStoreOffsetWithTermVector, isBinary areactually preserved in the index and known to the reader, so it'sworth having these methods available at search time?

I think Yonik once argued that the ideal API for geting a Documentout of
an IndexReader would be...

   /** @return map of field name to field values */
   public Map<String,String[]> getDocument(int id)

But that would lose the above is* methods and often would forceapplications to wrap that returned result in a new class anyway...


Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fieldable, AbstractField, Field

Reply via email to