[
https://issues.apache.org/jira/browse/SOLR-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611590#comment-16611590
]
David Smiley commented on SOLR-12593:
-------------------------------------
So I took the PR and further changed the ref guide page on this, and then the
default config slightly as well. My changes grew in scope to misc things I
didn't like in the guide for this feature but I hope other committers are happy
with it. FWIW I held back from doing more :) [~arafalov] I know you tend to the
configs so I'm hoping you can review this (or anyone of course).
* Revamped the "Key Solr Cell Concepts"
* Switched the examples / "trying out" instructions from using the
"techproducts" example config to using our default config (via -e
"schemaless"). Why? Firstly, I observed that the techproducts config didn't
have the URPs I wanted. Fixable, yes, but... Secondly, I think it simply
doesn't make sense to have the "techproducts" config, by virtue of its name,
have things other than .. you know... _tech products_.
** The default configset's schema oddly does not include an "ignored" field
type and "ignored_*" dynamic field. I added them. These are useful, especially
with Solr Cell.
** minutia: removed the metadata name mapping of metadata "meta" to "ignored_"
from the default parameters of the default configset's /update/extract request
handler. I don't see the point of this and FWIW it's not in the techproducts
config either. Lets keep this config more minimal.
** The default configset is schemaless, and so the "try tika" instructions
were modified to recognize the fact that the metadata is all automatically
added instead of how it used to be which was only those fields that happened to
be in the techproducts schema. This is good but there is an awkward part in the
last step of the demo if you want to _not_ map the metadata since it requires
wiping the core and starting over.
* Added a tip on URPs with an example to specify these processors.
> Remove date parsing functionality from extraction contrib
> ---------------------------------------------------------
>
> Key: SOLR-12593
> URL: https://issues.apache.org/jira/browse/SOLR-12593
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-12593.patch
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> The date parsing functionality in the extraction contrib is obsoleted by
> equivalent functionality in ParseDateFieldUpdateProcessorFactory. It should
> be removed. We should add documentation within this part of the ref guide on
> how to accomplish the same (and test it).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]