[jira] [Assigned] (SOLR-3287) 3x tutorial tries to demo schema features that don't work with 3x schema

Hoss Man (Assigned) (JIRA) Tue, 27 Mar 2012 18:21:45 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man reassigned SOLR-3287:
------------------------------

    Assignee: Hoss Man
    
> 3x tutorial tries to demo schema features that don't work with 3x schema
> ------------------------------------------------------------------------
>
>                 Key: SOLR-3287
>                 URL: https://issues.apache.org/jira/browse/SOLR-3287
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Blocker
>             Fix For: 3.6
>
>
> I just audited the tutorial on the 3x branch to ensure everything would work 
> for the 3.6 release, and ran into a two sections where things were very 
> confusing and seemed broken to me (even as a solr expert)
> https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/core/src/java/doc-files/tutorial.html
> 1) "Text Analysis" of the 5 queries in this section, only the "pixima" 
> example works (power-shot matches documents but not the ones the tutorial 
> suggests it should, and for different reasons).  The lead in para does 
> explain that you have to edit your schema.xml in order for these links to 
> work -- but it's confusing, and i honestly read it 3 times before i realized 
> what it was saying (the first two times i thought it was saying that 
> _because_ the content is in english, english specific field types are used, 
> and you can change those to text_general if you don't use english)
> Bottom line: the links are confusing since they don't work "out of the box" 
> with the simple commands shown so far
> {panel}
> If you know your textual content is English, as is the case for the example 
> documents in this tutorial, and you'd like to apply English-specific stemming 
> and stop word removal, as well as split compound words, you can use the 
> text_en_splitting fieldType instead. Go ahead and edit the schema.xml under 
> the solr/example/solr/conf directory, and change the type for fields text and 
> features from text_general to text_en_splitting. Restart the server and then 
> re-post all of the documents, and then these queries will show the 
> English-specific transformations: 
> * A search for power-shot matches PowerShot, and adata matches A-DATA due to 
> the use of WordDelimiterFilter and LowerCaseFilter.
> * A search for features:recharging matches Rechargeable due to stemming with 
> the EnglishPorterFilter.
> * A search for "1 gigabyte" matches things with GB, and the misspelled pixima 
> matches Pixma due to use of a SynonymFilter.
> {panel}
> * http://localhost:8983/solr/select/?indent=on&q=power-shot&fl=name
> * http://localhost:8983/solr/select/?indent=on&q=adata&fl=name
> * 
> http://localhost:8983/solr/select/?indent=on&q=features:recharging&fl=name,features
> * http://localhost:8983/solr/select/?indent=on&q=%221%20gigabyte%22&fl=name
> * http://localhost:8983/solr/select/?indent=on&q=pixima&fl=name
> 2) "Analysis Debugging"
> Likewise, all of the analysis.jsp example URLs attempt to show off how 
> various features work, but the fields used don't demonstrate the analysis 
> being discussed unless the user has edited the schema as discussed in the 
> previous section
> {panel}
> This shows how "Canon Power-Shot SD500" would be indexed as a value in the 
> name field. Each row of the table shows the resulting tokens after having 
> passed through the next TokenFilter in the analyzer for the name field. 
> Notice how both powershot and power, shot are indexed. Tokens generated at 
> the same position are shown in the same column, in this case shot and 
> powershot.
> Selecting verbose output will show more details, such as the name of each 
> analyzer component in the chain, token positions, and the start and end 
> positions of the token in the original text.
> Selecting highlight matches when both index and query values are provided 
> will take the resulting terms from the query value and highlight all matches 
> in the index value analysis.
> Here is an example of stemming and stop-words at work. 
> {panel}
> * 
> http://localhost:8983/solr/admin/analysis.jsp?name=name&val=Canon+Power-Shot+SD500
> * 
> http://localhost:8983/solr/admin/analysis.jsp?name=name&verbose=on&val=Canon+Power-Shot+SD500
> * 
> http://localhost:8983/solr/admin/analysis.jsp?name=name&highlight=on&val=Canon+Power-Shot+SD500&qval=Powershot%20sd-500
> * 
> http://localhost:8983/solr/admin/analysis.jsp?name=text&highlight=on&val=Four+score+and+seven+years+ago+our+fathers+brought+forth+on+this+continent+a+new+nation%2C+conceived+in+liberty+and+dedicated+to+the+proposition+that+all+men+are+created+equal.+&qval=liberties+and+equality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SOLR-3287) 3x tutorial tries to demo schema features that don't work with 3x schema

Reply via email to