[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Erick Erickson (JIRA) Tue, 11 Apr 2017 22:29:57 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965400#comment-15965400
 ]


Erick Erickson commented on SOLR-10229:
---------------------------------------

This looks really interesting. I'll look some more tomorrow, but some of nits 
and a significant question or two:

nit1:  The default managed-schema used by solrconfig-managed-schema.xml is 
actually very small. So since the first schema or two will be used as a 
template,  let's not introduce the property with 
System.setProperty("managed.schema.resourceName", "schema-tiny.xml") and just 
use solrconfig-managed-schema.xml and managed-schema in initCore

nit2: positionIncrementGap is mostly useful when you have mulitValued fields 
and are testing phrase queries that don't cross that gap so setting them here 
is probably unnecessary. Since this will be the first place others look to see 
how to use this capability let's nuke those.

nit3: mother-schema has <defaultSearchField>, which is deprecated.

nit4: Do the <copyField> directives in mother-schema make sense?


Significant question 1:

I'm a little uncomfortable with the build(h.getCore) method being called for 
every field type addition and every field addition. I'm not sure how much work 
that entails, will we be reloading the schema again and again? Perhaps 
[[email protected]] or [~hossman] can weigh in (or I'll look more in the 
morning). WDYT about this kind of pattern?

List<FieldType> listBlah.
listBlah.add(framework.createNewFieldType(blah blah blah).build(?));
listBlah.add(framework.createNewFieldType(blah blah blah).build(?));
framework.addFieldTypes(listBlah)

Ditto for Fields.... As I said, I really don't know whether this'll be more 
efficient or not, think of this as a marker to make sure we answer before 
diving in totally.

Significant question 2:

This patch contains LUCENE-7705. I briefly experimented with taking out the 
7705 bits and it seems pretty easy. Yet that's the driver for this 
functionality. WDYT about splitting the 7705 bits out (we'll have to commit 
this one before 7705) and adding this patch with other examples? I'm thinking 
of removing a couple of the current schema files and replacing them with this 
mechanism. I'd be happy to volunteer to do that part of the patch as I'd like 
to get some hands-on experience with this approach. There are some docValues 
schemas that are only used in one or two places that are likely candidates. I 
think the sweet spot for this mechanism is exactly those places where there are 
just a few tests that use a particular schema.

Along the way, by replacing the uses of at least two uses of a schema we can 
also make certain some test harness trickery isn't causing us to reload the 
mother-schema for every test suite...

I suppose that replacing some of the other schemas is probably not as bad as I 
expect as a number of them use, say, schema.xml but only really use a few 
fields out of it and we can just add existing fields from the mother schema 
with a couple of lines.

Anyway, great work, I think this is very close!


> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10229
>                 URL: https://issues.apache.org/jira/browse/SOLR-10229
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, 
> SOLR-10229.patch, SOLR-10229.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [[email protected]] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Reply via email to