[ 
https://issues.apache.org/jira/browse/SOLR-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621024#comment-13621024
 ] 

David Smiley commented on SOLR-4648:
------------------------------------

Feedback on your patch:
* in Solr, it's bad form to call Class.forName() as you're doing in 
PreAnalzyedField; you're supposed to use the SolrResourceLoader instead.  
Instead use {{schema.getResourceLoader().findClass(implName, 
PreAnalyzedParser.class);}}
* minor: you're using 4-spaces indent instead of 2 in the main value loop in 
your URP
* in those log.debug() calls, it's creating the string to potentially not even 
log it.  Instead use this feature of SLF4J: {{log.debug("Ignoring stored part - 
field '{}' is not stored.", sf.getName());}}  So few people use that great 
feature of SLF4J and instead rely on more bulky surrounding conditionals.
* When you call sf.createFields(o, 1.0f) in the URP, isn't this doing analysis 
-- the very analysis that you're trying to avoid by pre-analyzing?  It is.  And 
isn't 'o' the JSON or similar representation of the analyzed token stream, and 
thus shouldn't be passed in here?  When I run your test, I get a logged error 
because of this, actually.  Stepping through with a debugger further validated 
this.  This must be remedied.  Looking at what FieldType.createField() is 
doing, I propose you do the same in this URP:

{code}
org.apache.lucene.document.FieldType newType = new 
org.apache.lucene.document.FieldType();
    newType.setIndexed(field.indexed());
    newType.setTokenized(field.isTokenized());
    newType.setStored(field.stored());
    newType.setOmitNorms(field.omitNorms());
    newType.setIndexOptions(getIndexOptions(field, val));
    newType.setStoreTermVectors(field.storeTermVector());
    newType.setStoreTermVectorOffsets(field.storeTermOffsets());
    newType.setStoreTermVectorPositions(field.storeTermPositions());
//getIndexOptions() is:
    IndexOptions options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS;
    if (field.omitTermFreqAndPositions()) {
      options = IndexOptions.DOCS_ONLY;
    } else if (field.omitPositions()) {
      options = IndexOptions.DOCS_AND_FREQS;
    } else if (field.storeOffsetsWithPositions()) {
      options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    }
//then what createField() does is:
    Field f = new Field(name, val, type);
    f.setBoost(boost);
{code}
This won't work for all field types (e.g. numbers) but it will for TextField 
(the one with analysis!) and could potentially for a custom FieldType.  I don't 
think we should put much if any work into supporting other FieldTypes.

The resulting "Field" instance can be shared from one document to the next I 
believe, and so you can cache this in the URP and reset its value & tokenStream.
                
> Create a PreAnalyzedUpdateProcessor
> -----------------------------------
>
>                 Key: SOLR-4648
>                 URL: https://issues.apache.org/jira/browse/SOLR-4648
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.3, 5.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 4.3, 5.0
>
>         Attachments: SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch
>
>
> Spin-off from the discussion in SOLR-4619.
> Instead of using a PreAnalyzedField type we can use an UpdateRequestProcessor 
> that converts any input field values to StorableField-s, using the 
> PreAnalyzedParser-s, and then directly passes StorableField-s to 
> DocumentBuilder for indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to