Chris Hostetter wrote:

: I do like moving towards a separation of Document for indexing vs
: searching for 3.0.
:
: Disregarding for starters how we get there from here...
:
: Wouldn't we just want a base class (not an interface), say
: ReadOnlyField, that is used in documents retrieved by a reader? This
: class would also have Index.*, Store.*, TermVector.*, and
: isStored/Indexed/Tokenized/Compressed, etc, as these are recoverable
: from an index.  Couldn't this be a concrete class, ie, the actual
: class instantiated when a Document is loaded from a reader?

Yes, but one of the peeves I've heard lots of people express over
the years is that they want want to "decorate" the Documents returned by a
search, so that they can make those documents access alternate field
stores and metadata not in the index.  (LUCENE-778 started out being a
dicussion of wanting to pass custom subclasses of Document to
writer.addDocument(), but it also mentions wanting to get custom documents
back from IndexReader.

Imagine you're writing an app that does a search with Lucene, and then
returns a List<Document> ...

  public List<Document> myMethod(options) {
Document<List> docs = doSomeSearchStuff(indexreader, query, options)
    return docs;
  }

you've got alot of downstream code that calls myMethod and uses/ propogates this List<Document> ... and then one day you decide that for each document you want to also include some metadata that Lucene doesn't know anything
about, your downstream client code is happy to treat this new metadata
just like any other field. You could change the API of myMethod and jump through a lot of hoops changing all of your other code; or if "Document"
is a simple interface, you could do something like...

  public class MyDocumentWraper implements Document {
    public MyDocumentWraper(Document, otherData) {...}
public static List<Document> wrappList(List<Document>, otherData) {...}
  }
  public List<Document> myMethod(options) {
Document<List> docs = doSomeSearchStuff(indexreader, query, options)
    return MyDocumentWraper.wrapList(docs, getOtherData(options));
  }

(If i remember right, there are some comments to this effect in LUCENE-778
as well)

Wouldn't subclassing ReadOnlyDocument also work in this case, if you override the getField* to do your own new logic if it applies else fallback to super?

Alternatively .... we back away from distinguishing read only vs index time Document (and go back to a single concrete Field class). This way you can alter the fields of a Document returned from a reader. I agree it's not clear that forcing "read only" on a Document returned by a reader is the right approach. People who are careful (store enough fields, don't use boosting or have separate store for their boosting) could pull Documents from a reader, tweak them, and build a new index.

: And then a subclass, IndexableField, that adds reader & tokenStream
: values, get/set boost, setters to change a field's value, etc.

IndexableField really shouldn't be a subclass of whatever class is
returned after a sarch is done ... the methods used for accessing the
"stored" value of a returned document make as little sense in the
context of IndexableField as the setBoost/Reader/TokenStream functions of
Document currently make when a search is executed.

when all is said and done: an IndexableField and a SearchResultField
shouldn't have anything in common except *maybe* that they both have a
fieldName.

Actually I think they do share alot more than just name of the field? Accessing the "stored" value of a document is exactly what indexing needs to do when it indexes the document in the first place? Ie, a "stored" document "looks alot like" the document at indexing time that had been stored. And things like isTokenized, isTermVectorStored, isStoreOffsetWithTermVector, isBinary are actually preserved in the index and known to the reader, so it's worth having these methods available at search time?

I think Yonik once argued that the ideal API for geting a Document out of
an IndexReader would be...

   /** @return map of field name to field values */
   public Map<String,String[]> getDocument(int id)

But that would lose the above is* methods and often would force applications to wrap that returned result in a new class anyway...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to