Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-17 Thread via GitHub


thomasmueller merged PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-17 Thread via GitHub


thomasmueller commented on code in PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1396951519


##
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java:
##
@@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String 
columnName) throws IOExc
 return cachedResults.get(cacheKey);
 }
 LOG.trace("columnName = {} facet Data not present in cache...", 
columnName);
+if (EAGER_FACET_CACHE_FILL) {

Review Comment:
   I prefer to avoid concurrency. it is filled when the first one is accessed, 
so I think it "should be fine". (I know, it sounds like "famous last words")



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-17 Thread via GitHub


tihom88 commented on code in PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1396835757


##
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java:
##
@@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String 
columnName) throws IOExc
 return cachedResults.get(cacheKey);
 }
 LOG.trace("columnName = {} facet Data not present in cache...", 
columnName);
+if (EAGER_FACET_CACHE_FILL) {

Review Comment:
   As we are filling cache with all facets. Would it make sense to initialise 
it in constructor itself using future. Like @fabriziofortino did in 
https://github.com/apache/jackrabbit-oak/pull/1217



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-16 Thread via GitHub


sonarcloud[bot] commented on PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215#issuecomment-1814509482

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL)
 [4 Code 
Smells](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=false=CODE_SMELL)
   
   
[![87.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/60-16px.png
 
'87.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_coverage=list)
 [87.0% 
Coverage](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_coverage=list)
  
   
[![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/3-16px.png
 
'0.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_duplicated_lines_density=list)
 [0.0% 
Duplication](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak=1215=new_duplicated_lines_density=list)
   
   
![warning](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/message_warning-16px.png
 'warning') The version of Java (11.0.21) you have used to run this analysis is 
deprecated and we will stop accepting it soon. Please update to at least Java 
17.
   Read more 
[here](https://docs.sonarsource.com/sonarcloud/appendices/scanner-environment/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-16 Thread via GitHub


thomasmueller commented on code in PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1395684061


##
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java:
##
@@ -1677,6 +1708,28 @@ private List getFacetsUncached(int 
numberOfFacets, String columnName) thr
 indexNode.release();
 }
 }
+
+private List getFacetsUncached(Facets facets, int 
numberOfFacets, String columnName) throws IOException {

Review Comment:
   I did this on purpose actually: to keep the old code as is as much as 
possible, in case we need to switch back to it. I agree it would be better, but 
hopefully we don't need the old code in the future - in which case we should 
remove it. I'm fine to create a jira issue about removing the old code in 6 
month, so we don't forget. OK?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10549 Improve performance of facet count at scale (Lucene) [jackrabbit-oak]

2023-11-16 Thread via GitHub


fabriziofortino commented on code in PR #1215:
URL: https://github.com/apache/jackrabbit-oak/pull/1215#discussion_r1395604781


##
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java:
##
@@ -1644,11 +1647,39 @@ public List getFacets(int numberOfFacets, String 
columnName) throws IOExc
 return cachedResults.get(cacheKey);
 }
 LOG.trace("columnName = {} facet Data not present in cache...", 
columnName);
+if (EAGER_FACET_CACHE_FILL) {
+fillFacetCache(numberOfFacets);
+if (cachedResults.containsKey(cacheKey)) {
+LOG.trace("columnName = {} now found");

Review Comment:
   `cacheKey` is missing
   ```suggestion
   LOG.trace("columnName = {} now found", cacheKey);
   ```



##
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java:
##
@@ -1677,6 +1708,28 @@ private List getFacetsUncached(int 
numberOfFacets, String columnName) thr
 indexNode.release();
 }
 }
+
+private List getFacetsUncached(Facets facets, int 
numberOfFacets, String columnName) throws IOException {
+String facetFieldName = FulltextIndex.parseFacetField(columnName);
+try {
+ImmutableList.Builder res = new 
ImmutableList.Builder<>();
+FacetResult topChildren = 
facets.getTopChildren(numberOfFacets, facetFieldName);
+if (topChildren == null) {
+return null;
+}
+for (LabelAndValue lav : topChildren.labelValues) {
+res.add(new Facet(
+lav.label, lav.value.intValue()
+));
+}
+return res.build();
+} catch (IllegalArgumentException iae) {
+LOG.debug(iae.getMessage(), iae);
+LOG.warn("facets for {} not yet indexed: " + iae, 
facetFieldName);

Review Comment:
   this message mixes placeholders + interpolation. I would prefer to use just 
placeholders:
   ```suggestion
   LOG.warn("facets for {} not yet indexed: {}", 
facetFieldName, iae);
   ```



##
oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/ManyFacetsTest.java:
##
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.jackrabbit.oak.plugins.index.lucene.hybrid;
+
+import static 
org.apache.jackrabbit.guava.common.util.concurrent.MoreExecutors.newDirectExecutorService;
+import static 
org.apache.jackrabbit.oak.plugins.index.search.FulltextIndexConstants.PROP_FACETS;
+import static 
org.apache.jackrabbit.oak.plugins.index.search.FulltextIndexConstants.STATISTICAL_FACET_SAMPLE_SIZE_DEFAULT;
+import static 
org.apache.jackrabbit.oak.spi.mount.Mounts.defaultMountInfoProvider;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Properties;
+import java.util.Random;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Predicate;
+
+import javax.jcr.GuestCredentials;
+import javax.jcr.Repository;
+import javax.jcr.RepositoryException;
+import javax.jcr.Session;
+import javax.jcr.query.Query;
+import javax.jcr.query.QueryManager;
+import javax.jcr.query.QueryResult;
+import javax.jcr.query.Row;
+import javax.jcr.query.RowIterator;
+
+import org.apache.jackrabbit.oak.InitialContent;
+import org.apache.jackrabbit.oak.Oak;
+import org.apache.jackrabbit.oak.api.ContentRepository;
+import org.apache.jackrabbit.oak.api.Tree;
+import org.apache.jackrabbit.oak.api.Type;
+import org.apache.jackrabbit.oak.commons.PathUtils;
+import org.apache.jackrabbit.oak.commons.concurrent.ExecutorCloser;
+import org.apache.jackrabbit.oak.commons.json.JsonObject;
+import org.apache.jackrabbit.oak.jcr.Jcr;
+import org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate;
+import