[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764370#comment-16764370 ]
Ankit Jain edited comment on LUCENE-8635 at 2/10/19 9:51 AM: ------------------------------------------------------------- I added print statements while running the benchmarks, and the classification looks correct: {code} Initializing field offheap start=55 field=Date.taxonomy Initializing field offheap start=76 field=DayOfYear.sortedset Initializing field offheap start=97 field=Month.sortedset Initializing field offheap start=118 field=body Initializing field onheap start=267 field=date Initializing field onheap start=289 field=groupend Initializing field onheap start=311 field=id Initializing field onheap start=333 field=title {code} Though, when I restricted tests to PKLookups only using comp.addTaskPattern('PKLookup') in localrun.py, results look as expected: {code:title=wikimedium10k|borderStyle=solid} TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 163.29 (1.6%) 164.80 (2.1%) 0.9% (-2% - 4%) {code} {code:title=wikimedium10m|borderStyle=solid} TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 114.29 (1.7%) 114.73 (1.2%) 0.4% ( -2% - 3%) {code} It seems we are good with this change then. was (Author: akjain): I added print statements while running the benchmarks, and the classification looks correct: ``` Initializing field offheap start=55 field=Date.taxonomy Initializing field offheap start=76 field=DayOfYear.sortedset Initializing field offheap start=97 field=Month.sortedset Initializing field offheap start=118 field=body Initializing field onheap start=267 field=date Initializing field onheap start=289 field=groupend Initializing field onheap start=311 field=id Initializing field onheap start=333 field=title ``` Though, when I restricted tests to PKLookups only using comp.addTaskPattern('PKLookup') in localrun.py, results look as expected: ``` wikimedium10k TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 163.29 (1.6%) 164.80 (2.1%) 0.9% (-2% - 4%) ``` ``` wikimedium10m TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 114.29 (1.7%) 114.73 (1.2%) 0.4% ( -2% - 3%) ``` I guess we are good then. > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, > offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org