[jira] [Commented] (LUCENE-2308) Separately specify a field's type

Michael McCandless (JIRA) Tue, 07 Jun 2011 03:22:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045351#comment-13045351
 ]


Michael McCandless commented on LUCENE-2308:
--------------------------------------------

Thanks for the patch Nikola!

Note: when you submit patches that you intend to donate to Apache, you
should remember to check the box that says "Grant license to ASF...",
as long as you are the sole creator of that patch (and thus have the
right to grant this patch the ASF).  Patches that incorporate someone
elses source code are more interesting because we have to ensure the
license is compatible with Apache's, update our LICENSE/NOTICE, etc.

Stepping back here... I think we should think a bit about the target
end goal here and then work out the baby steps to get there?

I think ideally once we are done here, it should be incredibly simple
to create a document, something like this:

{code}
Document d = new Document();
d.add(new TextField(title));
d.add(new StringField(id));
d.add(new BinaryField(bytes));
d.add(new NumericField(price));
{code}

These classes each use a default FieldType under the hood:

  * TextField indexes, tokenizes, with norms and TFAP

  * StringField indexes untokenized and no norms, no TFAP (maybe)

  * BinaryField only stores the byte[]

  * NumericField does what it does today

If an app wants to tweak the type, it can do so, something like this:

{code}
FieldType titleFieldType = new FieldType(Textfield.DEFAULT_TYPE);
titleFieldType.setOmitNorms(true);
titleFieldType.setOmitTFAP(true);
d.add(new Field(titleFieldType, title));
{code}

Ie, the default *Field classes are sugar for binding to the common
default type, but you can easily go and customize the type if you want
to.

Does that sound "roughly" like the goal here....?


> Separately specify a field's type
> ---------------------------------
>
>                 Key: LUCENE-2308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2308
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

Reply via email to