[
https://issues.apache.org/jira/browse/SOLR-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621024#comment-13621024
]
David Smiley commented on SOLR-4648:
------------------------------------
Feedback on your patch:
* in Solr, it's bad form to call Class.forName() as you're doing in
PreAnalzyedField; you're supposed to use the SolrResourceLoader instead.
Instead use {{schema.getResourceLoader().findClass(implName,
PreAnalyzedParser.class);}}
* minor: you're using 4-spaces indent instead of 2 in the main value loop in
your URP
* in those log.debug() calls, it's creating the string to potentially not even
log it. Instead use this feature of SLF4J: {{log.debug("Ignoring stored part -
field '{}' is not stored.", sf.getName());}} So few people use that great
feature of SLF4J and instead rely on more bulky surrounding conditionals.
* When you call sf.createFields(o, 1.0f) in the URP, isn't this doing analysis
-- the very analysis that you're trying to avoid by pre-analyzing? It is. And
isn't 'o' the JSON or similar representation of the analyzed token stream, and
thus shouldn't be passed in here? When I run your test, I get a logged error
because of this, actually. Stepping through with a debugger further validated
this. This must be remedied. Looking at what FieldType.createField() is
doing, I propose you do the same in this URP:
{code}
org.apache.lucene.document.FieldType newType = new
org.apache.lucene.document.FieldType();
newType.setIndexed(field.indexed());
newType.setTokenized(field.isTokenized());
newType.setStored(field.stored());
newType.setOmitNorms(field.omitNorms());
newType.setIndexOptions(getIndexOptions(field, val));
newType.setStoreTermVectors(field.storeTermVector());
newType.setStoreTermVectorOffsets(field.storeTermOffsets());
newType.setStoreTermVectorPositions(field.storeTermPositions());
//getIndexOptions() is:
IndexOptions options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS;
if (field.omitTermFreqAndPositions()) {
options = IndexOptions.DOCS_ONLY;
} else if (field.omitPositions()) {
options = IndexOptions.DOCS_AND_FREQS;
} else if (field.storeOffsetsWithPositions()) {
options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
}
//then what createField() does is:
Field f = new Field(name, val, type);
f.setBoost(boost);
{code}
This won't work for all field types (e.g. numbers) but it will for TextField
(the one with analysis!) and could potentially for a custom FieldType. I don't
think we should put much if any work into supporting other FieldTypes.
The resulting "Field" instance can be shared from one document to the next I
believe, and so you can cache this in the URP and reset its value & tokenStream.
> Create a PreAnalyzedUpdateProcessor
> -----------------------------------
>
> Key: SOLR-4648
> URL: https://issues.apache.org/jira/browse/SOLR-4648
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.3, 5.0
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: 4.3, 5.0
>
> Attachments: SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch
>
>
> Spin-off from the discussion in SOLR-4619.
> Instead of using a PreAnalyzedField type we can use an UpdateRequestProcessor
> that converts any input field values to StorableField-s, using the
> PreAnalyzedParser-s, and then directly passes StorableField-s to
> DocumentBuilder for indexing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]