Re: Doing what does using SolrJ API

Walter Underwood Thu, 17 Sep 2020 10:38:25 -0700

If you want to ignore a field being sent to Solr, you can set indexed=false and 
stored=false for that field in schema.xml. It will take up room in schema.xml 
but
zero room on disk.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
> wrote:
> 
> Solr has a whole pipeline that you can run during document ingesting before
> the actual indexing happens. It is called Update Request Processor (URP)
> and is defined in solrconfig.xml or in an override file. Obviously, since
> you are indexing from SolrJ client, you have even more flexibility, but it
> is good to know about anyway.
> 
> You can read all about it at:
> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
> see the extensive list of processors you can leverage. The specific
> mentioned one is this one:
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> 
> Just a word of warning that Stateless URP is using Javascript, which is
> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
> dropped their javascript engine in JDK 14). So if one of the simpler URPs
> will do the job or a chain of them, that may be a better path to take.
> 
> Regards,
>   Alex.
> 
> 
> On Thu, 17 Sep 2020 at 13:13, Steven White <swhite4...@gmail.com> wrote:
> 
>> Thanks Erick.  Where can I learn more about "stateless script update
>> processor factory".  I don't know what you mean by this.
>> 
>> Steven
>> 
>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> 
>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>> really
>>> doubt you'll notice. That said, are these fields used for searching?
>>> Because you do have control over what gous into the index if you can put
>> a
>>> "stateless script update processor factory" in your update chain. There
>> you
>>> can do whatever you want, including combine all the fields into one and
>>> delete the original fields. There's no point in having your index
>> cluttered
>>> with unused fields, OTOH, it may not be worth the effort just to satisfy
>> my
>>> sense of aesthetics 😉
>>> 
>>> On Thu, Sep 17, 2020, 12:59 Steven White <swhite4...@gmail.com> wrote:
>>> 
>>>> Hi Eric,
>>>> 
>>>> Yes, this is coming from a DB.  Unfortunately I have no control over
>> the
>>>> list of fields.  Out of the 1000 fields that there maybe, no document,
>>> that
>>>> gets indexed into Solr will use more then about 50 and since i'm
>> copying
>>>> the values of those fields to the catch-all field and the catch-all
>> field
>>>> is my default search field, I don't expect any problem for having 1000
>>>> fields in Solr's schema, or should I?
>>>> 
>>>> Thanks
>>>> 
>>>> Steven
>>>> 
>>>> 
>>>> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>> erickerick...@gmail.com>
>>>> wrote:
>>>> 
>>>>> “there over 1000 of them[fields]”
>>>>> 
>>>>> This is often a red flag in my experience. Solr will handle that many
>>>>> fields, I’ve seen many more. But this is often a result of
>>>>> “database thinking”, i.e. your mental model of how all this data
>>>>> is from a DB perspective rather than a search perspective.
>>>>> 
>>>>> It’s unwieldy to have that many fields. Obviously I don’t know the
>>>>> particulars of
>>>>> your app, and maybe that’s the best design. Particularly if many of
>> the
>>>>> fields
>>>>> are sparsely populated, i.e. only a small percentage of the documents
>>> in
>>>>> your
>>>>> corpus have any value for that field then taking a step back and
>>> looking
>>>>> at the design might save you some grief down the line.
>>>>> 
>>>>> For instance, I’ve seen designs where instead of
>>>>> field1:some_value
>>>>> field2:other_value….
>>>>> 
>>>>> you use a single field with _tokens_ like:
>>>>> field:field1_some_value
>>>>> field:field2_other_value
>>>>> 
>>>>> that drops the complexity and increases performance.
>>>>> 
>>>>> Anyway, just a thought you might want to consider.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Sep 16, 2020, at 9:31 PM, Steven White <swhite4...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I figured it out.  It is as simple as creating a List<String> and
>>> using
>>>>>> that as the value part for SolrInputDocument.addField() API.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Steven
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4...@gmail.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I want to avoid creating a <copyField dest="CatchAll"
>>>>>>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
>>>> them
>>>>> and
>>>>>>> maybe more so managing it will be a pain).  Instead, I want to use
>>>> SolrJ
>>>>>>> API to do what <copyField/> does.  Any example of how I can do
>> this?
>>>> If
>>>>>>> there is an example online, that would be great.
>>>>>>> 
>>>>>>> Thanks in advance.
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Doing what does using SolrJ API

Reply via email to