[
https://issues.apache.org/jira/browse/SOLR-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414531#comment-13414531
]
Jan Høydahl commented on SOLR-3619:
-----------------------------------
bq. we are renaming it from example because we are recognizing people use it as
the default to built on.
And that's all fine - people need to start somewhere. But if they think that
adding a few <field>s to schema.xml is all Solr has to offer they'll build
crappy search apps - I've seen many of these out there. So in calling it
example (or template or skeleton or whaterver) gives people a hint that it's
not anything that they should expect to be sufficient for their need without
some more tuning (Solr is not GSA..)
{quote}
bq. Today's "collection1" is not very well tuned for PDF/HTML kind of docs
I haven't tried it in a while... can we improve it w/o getting in the way of
people who don't use solr-cell?
{quote}
When Solr is compared to various other search engines, what they tend to test
is web/filesystem crawling. So I really think that if we should include ONE
main "example" config, it should be geared towards HTML/PDF/DOC indexing,
either from crawling or pushing stuff from filesystem. That would mean that you
have a title, a teaser, body, URL/path and various metadata. There has been
some discussion on the list about improving user experience for such type of
input.
Sure, it is harder (much harder) to get excellent results from unstructured
text than from some nice synthetic structured xml docs, so it would take some
work to let Solr shine in those comparisons. One needed piece could be an
improved post.jar (or an feeder wrapper script) which can recursively traverse
folders and push files matching certain file types, with the correct MIME and
unique ID. That would let people quickly index, say, their home folder, and
then view the results in Solritas.
> Rename 'example' dir to 'server'
> --------------------------------
>
> Key: SOLR-3619
> URL: https://issues.apache.org/jira/browse/SOLR-3619
> Project: Solr
> Issue Type: Improvement
> Reporter: Mark Miller
> Assignee: Mark Miller
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3619.patch, server-name-layout.png
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]