FieldType API

Ryan Ernst (JIRA) Fri, 10 Oct 2014 12:37:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167374#comment-14167374
 ]


Ryan Ernst commented on LUCENE-6005:
------------------------------------

+1 overall.  This is sorely needed.  I also think we should "level the playing 
field" for trunk: start by getting trunk back to the same state as 5x with the 
document api, so that if this is ready in time for 5.0, it can be much more 
easily backported.

> Explore alternative to Document/Field/FieldType API
> ---------------------------------------------------
>
>                 Key: LUCENE-6005
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6005
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk
>
>
> Auto-prefix terms (LUCENE-5879) is blocked because it's impossible in
> Lucene today to add a simple API to use it, and I don't think we
> should commit features that only super-experts can figure out how to
> use: that's evil.
> The only realistic "workaround" for such new features is to instead
> add them directly to the various servers on top of Lucene, since they
> all already have nice schema APIs.
> I opened LUCENE-5989 to try do at least a baby step towards making it
> easier to use auto-prefix terms, so you can easily add singleton
> binary tokens, but even that has proven controversial.
> Net/net I think we have to solve the root cause of this by fixing the
> Document/Field/FieldType API so that new index-level features can have
> a usable API, properly defaulted for the right types of fields.
> Towards that, I'm exploring a replacement for
> Document/Field/FieldType.  The idea is to expose simple methods on the
> document class (no more separate Field and FieldType classes):
> {noformat}
>     doc.addLargeText("body", "some text");
>     doc.addShortText("title", "a title");
>     doc.addAtom("id", "29jafnn");
>     doc.addBinary("bytes", new byte[7]);
>     doc.addNumber("number", 17);
> {noformat}
> And then expose a separate FieldTypes class, that you pass to ctor of
> the new document class, which lets you set all the various per-field
> settings (stored, doc values, etc.).  E.g.:
> {noformat}
>     types.enableStored("id");
> {noformat}
> FieldTypes is a write-once schema, and it throws exceptions if you try
> to make invalid changes once a given setting is already written
> (e.g. enabling norms after having disabled them).  It will (I haven't
> implemented this yet) save its state into IndexWriter's commitData, so
> it's available when you open a new IndexWriter for append and when you
> open a reader.
> It has methods to set all the per-field settings (analyzer, stored,
> term vectors, norms, index options, doc values type), and chooses
> "reasonable" defaults based on the value's type when it suddenly sees
> a new field.  For example, when you add a number, it's indexed for
> range querying and sorting (numeric doc values) by default.
> FieldTypes provides the analyzer and codec (a little messy) that you
> pass to IndexWriterConfig.  Since it's effectively a persistent
> schema, it knows all about the available fields at search time, so we
> could use it to create queries (checking if they are valid given that
> field's type).  Query parsers and highlighters could consult it.
> Default UIs (above Lucene) could use it, etc.  This is all future .. I
> think for this issue the goal should be to "just" provide a "better"
> index-time API but not yet make use of it at search time.
> So with this change, for auto-prefix terms, we could add an "enable
> range queries/filters" option, but then validate that the selected
> postings format supports such an option.
> I know this exploration will be horribly controversial, but
> realistically I don't think Lucene can move on much further if we
> can't finally address this schema problem head on.
> This is long overdue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6005) Explore alternative to Document/Field/FieldType API

Reply via email to