[jira] [Commented] (LUCENE-7648) Millions of fields in an index makes some operations slow, opening a new searcher in particular

Erick Erickson (JIRA) Fri, 20 Jan 2017 09:23:09 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832135#comment-15832135
 ]


Erick Erickson commented on LUCENE-7648:
----------------------------------------

I've also seen this. Even if you reindex all the docs without any of those 
fields they're still not purged. Your only real choice is to re-index into a 
new collection and stop abusing dynamic fields ;).

In a conversation I had it was pointed out that purging unused fields from 
merged segments would impose some performance penalties for regular merges, and 
penalizing everyone else to support this kind of scenario is not something I'd 
vote for.

I do wonder though if there's any support for throwing an exception if some 
(configurable) limit was exceeded. In the case I saw it was a programming error 
rather than intentional.

But frankly most of the situations I see where there are so many fields are 
either programming errors or should be approached in a different way (payloads, 
indexing fieldX_keyword tokens rather than keywords in their own fields and the 
like). Not always possible of course....

> Millions of fields in an index makes some operations slow, opening a new 
> searcher in particular
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7648
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7648
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 4.10.4
>            Reporter: Shawn Heisey
>            Priority: Minor
>
> Got a Solr user who was experiencing very slow commit times on their index -- 
> 10 seconds or more.  This is on a 650K document index sized at about 420MB, 
> with all Solr cache autowarm counts at zero.
> After some profiling of their Solr install, they finally determined that the 
> problem was an abuse of dynamic fields.  The largest .fnm file in their index 
> was 130MB, with the total of all .fnm files at 140MB.  The user estimates 
> that they have about 2 million fields in this index.  They will be fixing the 
> situation so the field count is more reasonable.
> While I do understand that millions of fields in an index is a pathological 
> setup, and that some parts of Lucene operation are always going to be slow on 
> an index like that, 10 seconds for a new searcher seemed excessive to me.  
> Perhaps there is an opportunity for a *little* bit of optimization?
> The version is old -- 4.10.4.  They have not yet tried a newer version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7648) Millions of fields in an index makes some operations slow, opening a new searcher in particular

Reply via email to