Re: SolrDocument difference between String and text_general

2020-10-20 Thread Shawn Heisey

On 10/20/2020 1:53 AM, Cox, Owen wrote:

I've now written a Java Spring-Boot program to populate documents (snippet below) using SolrCrudRepository.  
This works when I don't index the "title" field, but when I try include title I get the following 
error "cannot change field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to 
inconsistent index options=DOCS"


I have no idea at all what a SolrCrudRepository is.  That must be part 
of Spring's repackaging of SolrJ.  It's probably not important anyway.


Some schema changes require more than a simple reindex.  For those 
changes, you must entirely delete the index directory, so that the 
Lucene index can be built from scratch.


That error message indicates that such a change has been made to the 
schema, and the existing index was NOT deleted before trying to index 
new docs.


Thanks,
Shawn


Re: [EXT: NEWSLETTER] SolrDocument difference between String and text_general

2020-10-20 Thread Konstantinos Koukouvis
Reindexing has to be done either by starting from scratch or by deleting all 
documents and then re-inserting them. Right?
https://lucene.apache.org/solr/guide/8_0/reindexing.html 


Regards,
Konstantinos

> On 20 Oct 2020, at 14:51, Erick Erickson  wrote:
> 
> You’re seeing consistent results here because you started with a _new_ 
> collection that had no old segments lying around.

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr







Re: [EXT: NEWSLETTER] SolrDocument difference between String and text_general

2020-10-20 Thread Erick Erickson
Owen:

Collection reload is necessary but not sufficient. You’ll still get wonky 
results even if you re-index everything unless you delete _all_ the documents 
first or start with a whole new collection. Each Lucene index is a “mini index” 
with its own picture of the structure of that index (i.e. the schema in force 
when it was created). If you have segments created with the old schema and 
other segments with the new schema, when they get merged the result is 
undefined. It may not blow up, but it also won't do what you want.

Take your change from text to string type and the title “my dog has fleas”. In 
the segment with the field defined as a Text type, you’ll be able to search for 
“dog” and get the doc. Similarly for Dog (assuming you have lowercasing in your 
analysis chain). “has fleas” would hit, as would “dog fleas”~2. 

For the segment defined with String, you will only get a hit if you search for 
“my dog has fleas”. You wouldn’t find the doc if you searched for any of the 
following:
- my AND dog AND has AND fleas
- “My dog has fleas”
- fleas
- “dog has fleas my"

When those segments are merged, Lucene doesn’t have the information to “do the 
right thing”, and even if it did the cost would be prohibitive because it’d be 
like re-indexing all the docs in one segment or the other.

You cannot spoof this by simply reindexing the corpus over top of an existing 
index since that’ll involve a bunch of segment merges.

You’re seeing consistent results here because you started with a _new_ 
collection that had no old segments lying around.

Best,
Erick

> On Oct 20, 2020, at 4:37 AM, Cox, Owen  wrote:
> 
> Hi Konstantinos, I think you're onto something there.  I don't think the 
> collection was reloaded, I've just tried the same code against a different 
> collection that uses the same configset; only difference being this 
> collection was created after the schema changes.  That works, so it must've 
> been the reload that was missing.
> 
> Thanks!
> 
> Owen Cox
> Senior Consultant | Deloitte MCS Limited
> D: +44 20 7007 1657
> o...@deloitte.co.uk | www.deloitte.co.uk
> 
> 
> -Original Message-
> From: Konstantinos Koukouvis 
> Sent: 20 October 2020 09:04
> To: solr-user@lucene.apache.org
> Subject: [EXT: NEWSLETTER] Re: SolrDocument difference between String and 
> text_general
> 
> Hi Owen,
> 
> If I understand correctly you have changed the schema, then reloaded the core 
> and reindexed all data right? Cause whenever I got this error I’ve usually 
> forgotten to do one of those two things…
> 
> Regards,
> Konstantinos
> 
>> On 20 Oct 2020, at 09:53, Cox, Owen  wrote:
>> 
>> Hi folks,
>> 
>> I'm using Solr 8.5.2 and populating documents which include a string field 
>> called "title".  This field used to be text_general, but the data was 
>> reindexed and we've been inserting data happily with REST calls and it's 
>> been behaving as desired.
>> 
>> I've now written a Java Spring-Boot program to populate documents (snippet 
>> below) using SolrCrudRepository.  This works when I don't index the "title" 
>> field, but when I try include title I get the following error "cannot change 
>> field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to 
>> inconsistent index options=DOCS"
>> 
>> To me that looks like it's trying to index the title as text_general and 
>> store it in a string field.  But the Solr schema states that field is 
>> string, all of the data in it is string, and any other string field in the 
>> document which is string is indexed correctly.
>> 
>> Could there be any hanging reference to the field's type anywhere?  Or some 
>> requirement that a field named "title" is always text_general or something 
>> odd like that?
>> 
>> Any help appreciated, thanks
>> Owen
>> 
>> 
>> 
>> @Data
>> @SolrDocument(collection="mycollection")
>> public class Node {
>> 
>>   @Id
>>   @Field
>>   private String id;
>> 
>> 
>>   @Field
>>   private String title;
>> 
>> 
>> 
>> 
>> IMPORTANT NOTICE
>> 
>> This communication is from Deloitte LLP, a limited liability partnership 
>> registered in England and Wales with registered number OC303675. Its 
>> registered office is 1 New Street Square, London EC4A 3HQ, United Kingdom. 
>> Deloitte LLP is the United Kingdom affiliate of Deloitte NSE LLP, a member 
>> firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by 
>> guarantee ("DTTL"). DTTL and each of its member firms are legally separate 
>> and independent entities. DTTL and 

RE: [EXT: NEWSLETTER] Re: SolrDocument difference between String and text_general

2020-10-20 Thread Cox, Owen
Hi Konstantinos, I think you're onto something there.  I don't think the 
collection was reloaded, I've just tried the same code against a different 
collection that uses the same configset; only difference being this collection 
was created after the schema changes.  That works, so it must've been the 
reload that was missing.

Thanks!

Owen Cox
Senior Consultant | Deloitte MCS Limited
D: +44 20 7007 1657
o...@deloitte.co.uk | www.deloitte.co.uk


-Original Message-
From: Konstantinos Koukouvis 
Sent: 20 October 2020 09:04
To: solr-user@lucene.apache.org
Subject: [EXT: NEWSLETTER] Re: SolrDocument difference between String and 
text_general

Hi Owen,

If I understand correctly you have changed the schema, then reloaded the core 
and reindexed all data right? Cause whenever I got this error I’ve usually 
forgotten to do one of those two things…

Regards,
Konstantinos

> On 20 Oct 2020, at 09:53, Cox, Owen  wrote:
>
> Hi folks,
>
> I'm using Solr 8.5.2 and populating documents which include a string field 
> called "title".  This field used to be text_general, but the data was 
> reindexed and we've been inserting data happily with REST calls and it's been 
> behaving as desired.
>
> I've now written a Java Spring-Boot program to populate documents (snippet 
> below) using SolrCrudRepository.  This works when I don't index the "title" 
> field, but when I try include title I get the following error "cannot change 
> field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent 
> index options=DOCS"
>
> To me that looks like it's trying to index the title as text_general and 
> store it in a string field.  But the Solr schema states that field is string, 
> all of the data in it is string, and any other string field in the document 
> which is string is indexed correctly.
>
> Could there be any hanging reference to the field's type anywhere?  Or some 
> requirement that a field named "title" is always text_general or something 
> odd like that?
>
> Any help appreciated, thanks
> Owen
>
>
>
> @Data
> @SolrDocument(collection="mycollection")
> public class Node {
>
>@Id
>@Field
>private String id;
>
>
>@Field
>private String title;
>
>
>
>
> IMPORTANT NOTICE
>
> This communication is from Deloitte LLP, a limited liability partnership 
> registered in England and Wales with registered number OC303675. Its 
> registered office is 1 New Street Square, London EC4A 3HQ, United Kingdom. 
> Deloitte LLP is the United Kingdom affiliate of Deloitte NSE LLP, a member 
> firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by 
> guarantee ("DTTL"). DTTL and each of its member firms are legally separate 
> and independent entities. DTTL and Deloitte NSE LLP do not provide services 
> to clients. Please see 
> www.deloitte.co.uk/about<https://www.deloitte.co.uk/about> to learn more 
> about our global network of member firms. For details of our professional 
> regulation please see 
> Regulators<https://www2.deloitte.com/uk/en/footerlinks1/regulators-and-provision-service-regulations.html>.
>
> This communication contains information which is confidential and may also be 
> privileged. It is for the exclusive use of the intended recipient(s). If you 
> are not the intended recipient(s), please notify 
> it.security...@deloitte.co.uk<mailto:it.security...@deloitte.co.uk> and 
> destroy this message immediately. Email communications cannot be guaranteed 
> to be secure or free from error or viruses. All emails sent to or from a 
> @deloitte.co.uk email account are securely archived and stored by an external 
> supplier within the European Union.
>
> You can understand more about how we collect and use (process) your personal 
> information in our Privacy 
> Notice<https://www2.deloitte.com/uk/en/legal/privacy.html>.
>
> Deloitte LLP does not accept any liability for use of or reliance on the 
> contents of this email by any person save by the intended recipient(s) to the 
> extent agreed in a Deloitte LLP engagement contract.
>
> Opinions, conclusions and other information in this email which have not been 
> delivered by way of the business of Deloitte LLP are neither given nor 
> endorsed by it.

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr





IMPORTANT NOTICE

This communication is from Deloitte LLP, a limited liability partnership 
registered in England and Wales with registered number OC303675. Its registered 
office is 1 New Street Square, London EC4A 3HQ, United Kingdom. Deloitte LLP is 
the United

Re: SolrDocument difference between String and text_general

2020-10-20 Thread Konstantinos Koukouvis
Hi Owen, 

If I understand correctly you have changed the schema, then reloaded the core 
and reindexed all data right? Cause whenever I got this error I’ve usually 
forgotten to do one of those two things…

Regards,
Konstantinos

> On 20 Oct 2020, at 09:53, Cox, Owen  wrote:
> 
> Hi folks,
> 
> I'm using Solr 8.5.2 and populating documents which include a string field 
> called "title".  This field used to be text_general, but the data was 
> reindexed and we've been inserting data happily with REST calls and it's been 
> behaving as desired.
> 
> I've now written a Java Spring-Boot program to populate documents (snippet 
> below) using SolrCrudRepository.  This works when I don't index the "title" 
> field, but when I try include title I get the following error "cannot change 
> field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent 
> index options=DOCS"
> 
> To me that looks like it's trying to index the title as text_general and 
> store it in a string field.  But the Solr schema states that field is string, 
> all of the data in it is string, and any other string field in the document 
> which is string is indexed correctly.
> 
> Could there be any hanging reference to the field's type anywhere?  Or some 
> requirement that a field named "title" is always text_general or something 
> odd like that?
> 
> Any help appreciated, thanks
> Owen
> 
> 
> 
> @Data
> @SolrDocument(collection="mycollection")
> public class Node {
> 
>@Id
>@Field
>private String id;
> 
> 
>@Field
>private String title;
> 
> 
> 
> 
> IMPORTANT NOTICE
> 
> This communication is from Deloitte LLP, a limited liability partnership 
> registered in England and Wales with registered number OC303675. Its 
> registered office is 1 New Street Square, London EC4A 3HQ, United Kingdom. 
> Deloitte LLP is the United Kingdom affiliate of Deloitte NSE LLP, a member 
> firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by 
> guarantee ("DTTL"). DTTL and each of its member firms are legally separate 
> and independent entities. DTTL and Deloitte NSE LLP do not provide services 
> to clients. Please see 
> www.deloitte.co.uk/about to learn more 
> about our global network of member firms. For details of our professional 
> regulation please see 
> Regulators.
> 
> This communication contains information which is confidential and may also be 
> privileged. It is for the exclusive use of the intended recipient(s). If you 
> are not the intended recipient(s), please notify 
> it.security...@deloitte.co.uk and 
> destroy this message immediately. Email communications cannot be guaranteed 
> to be secure or free from error or viruses. All emails sent to or from a 
> @deloitte.co.uk email account are securely archived and stored by an external 
> supplier within the European Union.
> 
> You can understand more about how we collect and use (process) your personal 
> information in our Privacy 
> Notice.
> 
> Deloitte LLP does not accept any liability for use of or reliance on the 
> contents of this email by any person save by the intended recipient(s) to the 
> extent agreed in a Deloitte LLP engagement contract.
> 
> Opinions, conclusions and other information in this email which have not been 
> delivered by way of the business of Deloitte LLP are neither given nor 
> endorsed by it.

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr







SolrDocument difference between String and text_general

2020-10-20 Thread Cox, Owen
Hi folks,

I'm using Solr 8.5.2 and populating documents which include a string field 
called "title".  This field used to be text_general, but the data was reindexed 
and we've been inserting data happily with REST calls and it's been behaving as 
desired.

I've now written a Java Spring-Boot program to populate documents (snippet 
below) using SolrCrudRepository.  This works when I don't index the "title" 
field, but when I try include title I get the following error "cannot change 
field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent 
index options=DOCS"

To me that looks like it's trying to index the title as text_general and store 
it in a string field.  But the Solr schema states that field is string, all of 
the data in it is string, and any other string field in the document which is 
string is indexed correctly.

Could there be any hanging reference to the field's type anywhere?  Or some 
requirement that a field named "title" is always text_general or something odd 
like that?

Any help appreciated, thanks
Owen



@Data
@SolrDocument(collection="mycollection")
public class Node {

@Id
@Field
private String id;


@Field
private String title;




IMPORTANT NOTICE

This communication is from Deloitte LLP, a limited liability partnership 
registered in England and Wales with registered number OC303675. Its registered 
office is 1 New Street Square, London EC4A 3HQ, United Kingdom. Deloitte LLP is 
the United Kingdom affiliate of Deloitte NSE LLP, a member firm of Deloitte 
Touche Tohmatsu Limited, a UK private company limited by guarantee ("DTTL"). 
DTTL and each of its member firms are legally separate and independent 
entities. DTTL and Deloitte NSE LLP do not provide services to clients. Please 
see www.deloitte.co.uk/about to learn more 
about our global network of member firms. For details of our professional 
regulation please see 
Regulators.

This communication contains information which is confidential and may also be 
privileged. It is for the exclusive use of the intended recipient(s). If you 
are not the intended recipient(s), please notify 
it.security...@deloitte.co.uk and destroy 
this message immediately. Email communications cannot be guaranteed to be 
secure or free from error or viruses. All emails sent to or from a 
@deloitte.co.uk email account are securely archived and stored by an external 
supplier within the European Union.

You can understand more about how we collect and use (process) your personal 
information in our Privacy 
Notice.

Deloitte LLP does not accept any liability for use of or reliance on the 
contents of this email by any person save by the intended recipient(s) to the 
extent agreed in a Deloitte LLP engagement contract.

Opinions, conclusions and other information in this email which have not been 
delivered by way of the business of Deloitte LLP are neither given nor endorsed 
by it.