analyzed field: Store internal value instead of input one
---------------------------------------------------------
Key: SOLR-1997
URL: https://issues.apache.org/jira/browse/SOLR-1997
Project: Solr
Issue Type: New Feature
Affects Versions: 1.4.1, 1.4, 1.5
Reporter: Joan Codina
Fix For: 1.5, 1.4.1, 1.4
Solr implements a set of filters and tokenizers that allow the filtering and
treatment of text, but when the field is set to be stored, the text stored is
the input one. This is may useful when the end user reads the input, but may
not be like this in others, cases, when for example there are payloads and the
text is something like A|2.0 good|1.0 day|3.0, or if the result of a query is
processed using something like Carrot2
So this is a simple new kind of field that takes as input the output of a given
type (source), and then performs the normal processing with the desired
tokenizers and filters . The difference is that the stored value is the output
of the source type, and this is what is retrieved when getting the document.
The name of the field type is AnalyzedField and in the schema is introduced in
the following way to create the analyzedSourceType from the SourceType
<fieldType name="SourceType" class="solr.TextField" >
<analyzer type="index">
<tokenizer
class="solr.StandardTokenizerFactory" />
<filter class......." />
</analyzer>
<analyzer type="query">
<tokenizer
class="solr.StandardTokenizerFactory" />
<filter ....." />
</analyzer>
</fieldType>
<fieldType name="analyzedSoureType" class="solr.AnalyzedField"
positionIncrementGap="100" preProcessType="SourceType">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
many times just the WhitespaceTokenizerFactory is needed as the tokens have
already been cut down by the SourceType
finally, a field can be declared as
<field name="analyzedData" type="analyzedSoureType" indexed="true"
stored="true" termVectors="true" multiValued="true"/>
which can be written directly or can be defined as a copy of the source one.
<field name="Data" type="analyzedSoureType" indexed="true" stored="true"
termVectors="true" multiValued="true"/>
...
<copyField source=data" dest="analyzedData"/>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]