[jira] [Commented] (LUCENE-9969) DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机

Michael McCandless (Jira) Mon, 30 Aug 2021 09:14:06 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406807#comment-17406807
 ]


Michael McCandless commented on LUCENE-9969:
--------------------------------------------

Let's try to find a better data-structure to do what these three costly arrays 
are doing for faceting?

The arrays are fully recomputed on each {{refresh}}, which is very costly.  
Each array has length equal to the cardinality of all your facet labels in each 
underlying indexed facet field.  So that's 12 bytes of heap per unique facet 
label, times two during {{refresh}}.

The arrays are used to lookup which {{dimension}} is the parent for a given 
facet label (well, from its ordinal).  Non-hierarchical faceting uses only one 
array (either {{parent}} or {{children}}) to know how to collate all the facet 
ordinals seen into the "top N per dimension".  This is all necessary because 
taxonomy facets squish multiple dimensions (the full "hierarchy" if using 
hierarchical facets) into a single underlying Lucene indexed field.

Maybe [~shaie] has some ideas on how we might use doc values (not available 
when {{lucene/facet}} was first added) instead?

> DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机
> ------------------------------------------------
>
>                 Key: LUCENE-9969
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9969
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 6.6.2
>            Reporter: FengFeng Cheng
>            Priority: Trivial
>         Attachments: image-2021-05-24-13-43-43-289.png
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 首先数据量很大，jvm内存为90G，但是TaxonomyIndexArrays几乎占走了一半
> !image-2021-05-24-13-43-43-289.png!
> 请问对于TaxonomyReader是否有更好的使用方式或者其他的优化？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9969) DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机

Reply via email to