[ https://issues.apache.org/jira/browse/LUCENE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406807#comment-17406807 ]
Michael McCandless commented on LUCENE-9969: -------------------------------------------- Let's try to find a better data-structure to do what these three costly arrays are doing for faceting? The arrays are fully recomputed on each {{refresh}}, which is very costly. Each array has length equal to the cardinality of all your facet labels in each underlying indexed facet field. So that's 12 bytes of heap per unique facet label, times two during {{refresh}}. The arrays are used to lookup which {{dimension}} is the parent for a given facet label (well, from its ordinal). Non-hierarchical faceting uses only one array (either {{parent}} or {{children}}) to know how to collate all the facet ordinals seen into the "top N per dimension". This is all necessary because taxonomy facets squish multiple dimensions (the full "hierarchy" if using hierarchical facets) into a single underlying Lucene indexed field. Maybe [~shaie] has some ideas on how we might use doc values (not available when {{lucene/facet}} was first added) instead? > DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机 > ------------------------------------------------ > > Key: LUCENE-9969 > URL: https://issues.apache.org/jira/browse/LUCENE-9969 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Affects Versions: 6.6.2 > Reporter: FengFeng Cheng > Priority: Trivial > Attachments: image-2021-05-24-13-43-43-289.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > 首先数据量很大,jvm内存为90G,但是TaxonomyIndexArrays几乎占走了一半 > !image-2021-05-24-13-43-43-289.png! > 请问对于TaxonomyReader是否有更好的使用方式或者其他的优化? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org