3x tutorial tries to demo schema features that don't work with 3x schema
------------------------------------------------------------------------

                 Key: SOLR-3287
                 URL: https://issues.apache.org/jira/browse/SOLR-3287
             Project: Solr
          Issue Type: Bug
            Reporter: Hoss Man
            Priority: Blocker
             Fix For: 3.6


I just audited the tutorial on the 3x branch to ensure everything would work 
for the 3.6 release, and ran into a two sections where things were very 
confusing and seemed broken to me (even as a solr expert)

https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/core/src/java/doc-files/tutorial.html

1) "Text Analysis" of the 5 queries in this section, only the "pixima" example 
works (power-shot matches documents but not the ones the tutorial suggests it 
should, and for different reasons).  The lead in para does explain that you 
have to edit your schema.xml in order for these links to work -- but it's 
confusing, and i honestly read it 3 times before i realized what it was saying 
(the first two times i thought it was saying that _because_ the content is in 
english, english specific field types are used, and you can change those to 
text_general if you don't use english)

Bottom line: the links are confusing since they don't work "out of the box" 
with the simple commands shown so far

{panel}
If you know your textual content is English, as is the case for the example 
documents in this tutorial, and you'd like to apply English-specific stemming 
and stop word removal, as well as split compound words, you can use the 
text_en_splitting fieldType instead. Go ahead and edit the schema.xml under the 
solr/example/solr/conf directory, and change the type for fields text and 
features from text_general to text_en_splitting. Restart the server and then 
re-post all of the documents, and then these queries will show the 
English-specific transformations: 

* A search for power-shot matches PowerShot, and adata matches A-DATA due to 
the use of WordDelimiterFilter and LowerCaseFilter.
* A search for features:recharging matches Rechargeable due to stemming with 
the EnglishPorterFilter.
* A search for "1 gigabyte" matches things with GB, and the misspelled pixima 
matches Pixma due to use of a SynonymFilter.
{panel}

* http://localhost:8983/solr/select/?indent=on&q=power-shot&fl=name
* http://localhost:8983/solr/select/?indent=on&q=adata&fl=name
* 
http://localhost:8983/solr/select/?indent=on&q=features:recharging&fl=name,features
* http://localhost:8983/solr/select/?indent=on&q=%221%20gigabyte%22&fl=name
* http://localhost:8983/solr/select/?indent=on&q=pixima&fl=name

2) "Analysis Debugging"

Likewise, all of the analysis.jsp example URLs attempt to show off how various 
features work, but the fields used don't demonstrate the analysis being 
discussed unless the user has edited the schema as discussed in the previous 
section

{panel}
This shows how "Canon Power-Shot SD500" would be indexed as a value in the name 
field. Each row of the table shows the resulting tokens after having passed 
through the next TokenFilter in the analyzer for the name field. Notice how 
both powershot and power, shot are indexed. Tokens generated at the same 
position are shown in the same column, in this case shot and powershot.

Selecting verbose output will show more details, such as the name of each 
analyzer component in the chain, token positions, and the start and end 
positions of the token in the original text.

Selecting highlight matches when both index and query values are provided will 
take the resulting terms from the query value and highlight all matches in the 
index value analysis.

Here is an example of stemming and stop-words at work. 
{panel}
* 
http://localhost:8983/solr/admin/analysis.jsp?name=name&val=Canon+Power-Shot+SD500
* 
http://localhost:8983/solr/admin/analysis.jsp?name=name&verbose=on&val=Canon+Power-Shot+SD500
* 
http://localhost:8983/solr/admin/analysis.jsp?name=name&highlight=on&val=Canon+Power-Shot+SD500&qval=Powershot%20sd-500
* 
http://localhost:8983/solr/admin/analysis.jsp?name=text&highlight=on&val=Four+score+and+seven+years+ago+our+fathers+brought+forth+on+this+continent+a+new+nation%2C+conceived+in+liberty+and+dedicated+to+the+proposition+that+all+men+are+created+equal.+&qval=liberties+and+equality



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to