If you reindex, I’ve become a big fan of adding a date field with an index timestamp. That will allow you to check whether everything has been reindexed.
<field name="indexed_datetime" type="date" stored="true" indexed="true" multiValued="false" default="NOW" docValues="true" /> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 28, 2020, at 2:11 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > A regex search at query time would leave room for attacks (eg a regex can > easily be designed to block the Solr server forever). > > If the field is store you can also try to use a cursor to go through all > entries using a cursor and reindex the doc based on the field: > > https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html > > This would also imply that you have the other fields stored. Otherwise > reindex. > You can do this in parallel to the existing index and once finished simply > change the alias for the collection (that would be without any downtime for > the users but you require of course corresponding space). > >> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 <lstusr...@gmail.com>: >> >> Possible... yes. Agreed that this is the right approach. But if we already >> have a big index that we're searching through? Any way to "hack it"? >> >>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood <wun...@wunderwood.org> >>> wrote: >>> >>> I’d do that at index time. Add an update request processor script that >>> does the regex and adds a field has_credit_card_number:true. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>>> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4 <lstusr...@gmail.com> wrote: >>>> >>>> Let's say I have a text field that's been indexed with the standard >>>> tokenizer, and I want to match the docs that have credit card numbers in >>>> them (this is for altruistic purposes, not nefarious ones!). What's the >>>> best way to build a search that will do this? >>>> >>>> Searching for "???? ???? ???? ????" seems to return inconsistent results. >>>> >>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it >>>> should work, but that's not matching the docs I think it should either... >>>> >>>> Any suggestions? >>>> >>>> Thanks In Advance! >>> >>>