Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
thomasmueller merged PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
thomasmueller commented on code in PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1396951519 ## oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java: ## @@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String columnName) throws IOExc return cachedResults.get(cacheKey); } LOG.trace("columnName = {} facet Data not present in cache...", columnName); +if (EAGER_FACET_CACHE_FILL) { Review Comment: I prefer to avoid concurrency. it is filled when the first one is accessed, so I think it "should be fine". (I know, it sounds like "famous last words") -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
tihom88 commented on code in PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1396835757 ## oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java: ## @@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String columnName) throws IOExc return cachedResults.get(cacheKey); } LOG.trace("columnName = {} facet Data not present in cache...", columnName); +if (EAGER_FACET_CACHE_FILL) { Review Comment: As we are filling cache with all facets. Would it make sense to initialise it in constructor itself using future. Like @fabriziofortino did in https://github.com/apache/jackrabbit-oak/pull/1217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
sonarcloud[bot] commented on PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215#issuecomment-1814509482 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL) [4 Code Smells](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL) [![87.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/60-16px.png '87.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_coverage=list) [87.0% Coverage](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_coverage=list) [![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/3-16px.png '0.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_duplicated_lines_density=list) [0.0% Duplication](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_duplicated_lines_density=list) ![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png 'warning') The version of Java (11.0.21) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. Read more [here](https://docs.sonarsource.com/sonarcloud/appendices/scanner-environment/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
thomasmueller commented on code in PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1395684061 ## oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java: ## @@ -1677,6 +1708,28 @@ private List getFacetsUncached(int numberOfFacets, String columnName) thr indexNode.release(); } } + +private List getFacetsUncached(Facets facets, int numberOfFacets, String columnName) throws IOException { Review Comment: I did this on purpose actually: to keep the old code as is as much as possible, in case we need to switch back to it. I agree it would be better, but hopefully we don't need the old code in the future - in which case we should remove it. I'm fine to create a jira issue about removing the old code in 6 month, so we don't forget. OK? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]
fabriziofortino commented on code in PR #1215: URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1395604781 ## oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java: ## @@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String columnName) throws IOExc return cachedResults.get(cacheKey); } LOG.trace("columnName = {} facet Data not present in cache...", columnName); +if (EAGER_FACET_CACHE_FILL) { +fillFacetCache(numberOfFacets); +if (cachedResults.containsKey(cacheKey)) { +LOG.trace("columnName = {} now found"); Review Comment: `cacheKey` is missing ```suggestion LOG.trace("columnName = {} now found", cacheKey); ``` ## oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java: ## @@ -1677,6 +1708,28 @@ private List getFacetsUncached(int numberOfFacets, String columnName) thr indexNode.release(); } } + +private List getFacetsUncached(Facets facets, int numberOfFacets, String columnName) throws IOException { +String facetFieldName = FulltextIndex.parseFacetField(columnName); +try { +ImmutableList.Builder res = new ImmutableList.Builder<>(); +FacetResult topChildren = facets.getTopChildren(numberOfFacets, facetFieldName); +if (topChildren == null) { +return null; +} +for (LabelAndValue lav : topChildren.labelValues) { +res.add(new Facet( +lav.label, lav.value.intValue() +)); +} +return res.build(); +} catch (IllegalArgumentException iae) { +LOG.debug(iae.getMessage(), iae); +LOG.warn("facets for {} not yet indexed: " + iae, facetFieldName); Review Comment: this message mixes placeholders + interpolation. I would prefer to use just placeholders: ```suggestion LOG.warn("facets for {} not yet indexed: {}", facetFieldName, iae); ``` ## oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/ManyFacetsTest.java: ## @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.jackrabbit.oak.plugins.index.lucene.hybrid; + +import static org.apache.jackrabbit.guava.common.util.concurrent.MoreExecutors.newDirectExecutorService; +import static org.apache.jackrabbit.oak.plugins.index.search.FulltextIndexConstants.PROP_FACETS; +import static org.apache.jackrabbit.oak.plugins.index.search.FulltextIndexConstants.STATISTICAL_FACET_SAMPLE_SIZE_DEFAULT; +import static org.apache.jackrabbit.oak.spi.mount.Mounts.defaultMountInfoProvider; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.io.File; +import java.io.IOException; +import java.util.Properties; +import java.util.Random; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.function.Predicate; + +import javax.jcr.GuestCredentials; +import javax.jcr.Repository; +import javax.jcr.RepositoryException; +import javax.jcr.Session; +import javax.jcr.query.Query; +import javax.jcr.query.QueryManager; +import javax.jcr.query.QueryResult; +import javax.jcr.query.Row; +import javax.jcr.query.RowIterator; + +import org.apache.jackrabbit.oak.InitialContent; +import org.apache.jackrabbit.oak.Oak; +import org.apache.jackrabbit.oak.api.ContentRepository; +import org.apache.jackrabbit.oak.api.Tree; +import org.apache.jackrabbit.oak.api.Type; +import org.apache.jackrabbit.oak.commons.PathUtils; +import org.apache.jackrabbit.oak.commons.concurrent.ExecutorCloser; +import org.apache.jackrabbit.oak.commons.json.JsonObject; +import org.apache.jackrabbit.oak.jcr.Jcr; +import org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate; +import