cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522486838
########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName())))) { - // if store is set in the transformer we should overwrite the logger - final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map<String, String[]> transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); - final FeatureStore store = fr.getFeatureStore(featureStoreName); - featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - - try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); Review comment: LTRFeatureLoggerTransformerFactory.2 - `loggingModel` being a member of the transformer factory gives it `SolrCore` lifetime/scope but here it's initialised based on per-request parameters. If multiple threads use the same transformer factory object concurrently then they might trampled upon each other. https://github.com/cpoerschke/lucene-solr/commit/4912daccd596435f5c61ac1a3cf86eaebb039118 proposes to not have the logging model as a member of the transformer factory. ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -79,6 +81,7 @@ private char csvFeatureSeparator = CSVFeatureLogger.DEFAULT_FEATURE_SEPARATOR; private LTRThreadModule threadManager = null; + private LoggingModel loggingModel = null; Review comment: LTRFeatureLoggerTransformerFactory.1 - `loggingModel` is a member of of the transformer factory here. ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } Review comment: LTRFeatureLoggerTransformerFactory.3 - I noted that `threadManager` here is an existing member of the transformer factory and it is initialised as part of request processing. Since there's no locking or anything there could be a chance that multiple threads concurrently call `threadManager.setExecutor()` but the argument to the set call is not specific to the request i.e. all requests would set the same thing (whereas for the logging model different requests could supply a different feature store name via the `fl=[feature store=...]` parameter). ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName())))) { - // if store is set in the transformer we should overwrite the logger - final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map<String, String[]> transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); - final FeatureStore store = fr.getFeatureStore(featureStoreName); - featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - - try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); + } + + private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); + } - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features + /** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ + private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { + // if store is set in the transformer we should overwrite the logger + final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - }catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); - } - } + final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); + transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { - scoringQuery.setOriginalQuery(context.getQuery()); + loggingModel = new LoggingModel(loggingModelName, + transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ - scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + } - try { - modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); + /** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that featureStore, + * It can be re-used + * + * @param rerankingQueriesFromContext reranking queries + * @param transformerFeatureStore explicit feature store for the transformer + * @param transformerExternalFeatureInfo explicit efi for the transformer + */ + private void setupRerankingQueriesForLogging(LTRScoringQuery[] rerankingQueriesFromContext, String transformerFeatureStore, Map<String, String[]> transformerExternalFeatureInfo) { + if (docsWereNotReranked) { //no reranking query + LTRScoringQuery loggingQuery = new LTRScoringQuery(loggingModel, + transformerExternalFeatureInfo, + true, + threadManager); + rerankingQueries = new LTRScoringQuery[]{loggingQuery}; + } else { + rerankingQueries = new LTRScoringQuery[rerankingQueriesFromContext.length]; + System.arraycopy(rerankingQueriesFromContext, 0, rerankingQueries, 0, rerankingQueriesFromContext.length); + + if (transformerFeatureStore != null) {// explicit feature store for the transformer + LTRScoringModel matchingRerankingModel = null; + for (LTRScoringQuery rerankingQuery : rerankingQueries) { + if (isModelMatchingFeatureStore(transformerFeatureStore, rerankingQuery.getScoringModel())) { Review comment: LTRFeatureLoggerTransformerFactory.5 - the `rerankingQuery` here could be an `OriginalRankingLTRScoringQuery` object which inherently has no feature store. The implementation here works because it relies on `rerankingQuery.getScoringModel()` returning `null` for `OriginalRankingLTRScoringQuery` objects and because `isModelMatchingFeatureStore` checks model null-ness. This is very subtle and it could be made more explicit for code clarity. ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName())))) { - // if store is set in the transformer we should overwrite the logger - final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map<String, String[]> transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); - final FeatureStore store = fr.getFeatureStore(featureStoreName); - featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - - try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); + } + + private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); + } - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features + /** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ + private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { + // if store is set in the transformer we should overwrite the logger + final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - }catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); - } - } + final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); + transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { - scoringQuery.setOriginalQuery(context.getQuery()); + loggingModel = new LoggingModel(loggingModelName, + transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ - scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + } - try { - modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); + /** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that featureStore, + * It can be re-used + * + * @param rerankingQueriesFromContext reranking queries + * @param transformerFeatureStore explicit feature store for the transformer + * @param transformerExternalFeatureInfo explicit efi for the transformer + */ Review comment: LTRFeatureLoggerTransformerFactory.4 - Very nice overview of the different code paths here. https://github.com/cpoerschke/lucene-solr/commit/3a61287a0e4fb5a77f080a92e7129582b234cbd7 suggests some tweaks to the implementation, I will annotate comments re: my thinking behind those suggestions. Let me know what you think? ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName())))) { - // if store is set in the transformer we should overwrite the logger - final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map<String, String[]> transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); - final FeatureStore store = fr.getFeatureStore(featureStoreName); - featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - - try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); + } + + private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); + } - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features + /** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ + private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { + // if store is set in the transformer we should overwrite the logger + final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - }catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); - } - } + final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); + transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { - scoringQuery.setOriginalQuery(context.getQuery()); + loggingModel = new LoggingModel(loggingModelName, + transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ - scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + } - try { - modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); + /** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that featureStore, + * It can be re-used + * + * @param rerankingQueriesFromContext reranking queries + * @param transformerFeatureStore explicit feature store for the transformer + * @param transformerExternalFeatureInfo explicit efi for the transformer + */ + private void setupRerankingQueriesForLogging(LTRScoringQuery[] rerankingQueriesFromContext, String transformerFeatureStore, Map<String, String[]> transformerExternalFeatureInfo) { + if (docsWereNotReranked) { //no reranking query + LTRScoringQuery loggingQuery = new LTRScoringQuery(loggingModel, + transformerExternalFeatureInfo, + true, + threadManager); + rerankingQueries = new LTRScoringQuery[]{loggingQuery}; + } else { + rerankingQueries = new LTRScoringQuery[rerankingQueriesFromContext.length]; + System.arraycopy(rerankingQueriesFromContext, 0, rerankingQueries, 0, rerankingQueriesFromContext.length); + + if (transformerFeatureStore != null) {// explicit feature store for the transformer + LTRScoringModel matchingRerankingModel = null; + for (LTRScoringQuery rerankingQuery : rerankingQueries) { + if (isModelMatchingFeatureStore(transformerFeatureStore, rerankingQuery.getScoringModel())) { + matchingRerankingModel = rerankingQuery.getScoringModel(); + } + } + if (matchingRerankingModel != null) {//one of the LTR query model can be re-used + for (LTRScoringQuery rerankingQuery : rerankingQueries) { + rerankingQuery.setLtrScoringModel(matchingRerankingModel); + } + } else { + for (LTRScoringQuery rerankingQuery : rerankingQueries) { + rerankingQuery.setLtrScoringModel(loggingModel); + rerankingQuery.setExtractAllFeatures(true); + rerankingQuery.setEfi(transformerExternalFeatureInfo); + } Review comment: LTRFeatureLoggerTransformerFactory.6 - the reranking queries from the context were copied and now they are being modified. The "why copy-and-modify instead construct-a-new-one?" question may arise for code readers. ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ########## @@ -271,17 +340,24 @@ public void transform(SolrDocument doc, int docid) private void implTransform(SolrDocument doc, int docid, Float score) throws IOException { - Object fv = featureLogger.getFeatureVector(docid, scoringQuery, searcher); - if (fv == null) { // FV for this document was not in the cache - fv = featureLogger.makeFeatureVector( - LTRRescorer.extractFeaturesInfo( - modelWeight, - docid, - (docsWereNotReranked ? score : null), - leafContexts)); + LTRScoringQuery rerankingQuery = rerankingQueries[0]; + LTRScoringQuery.ModelWeight rerankingModelWeight = modelWeights[0]; + if (rerankingQueries.length > 1 && ((LTRInterleavingScoringQuery)rerankingQueries[1]).getPickedInterleavingDocIds().contains(docid)) { Review comment: LTRFeatureLoggerTransformerFactory.7 - here now the reranking queries that were copied from the context (and modified) are being used. It's very subtle but questions that may arise for code readers are (a) whether or not the copy happened before or after the picked interleaving doc ids were set, and/or (b) if the copy is shallow so that the query from the context and the query in the transformer share the same underlying picked ids object, and/or (c) if any shared underlying picked ids object remains shared when 'set' is called on it. https://github.com/cpoerschke/lucene-solr/commit/3a61287a0e4fb5a77f080a92e7129582b234cbd7 proposes to introduce (alongside the `rerankingQueries` array) a `rerankingQueriesFromContext` array which is used only for the "picked doc id" check which would avoid the "was it copied before or after the picked ids were set" questions. An extra implementation subtlety is as follows: get-picked-interleaving-doc-ids applies only to interleaving queries and because of that the "copy-and-modify" approach was required for `rerankingQueries` before but if the already existing queries from the context can be used for the picked-or-not check then the `rerankingQueries` can follow a "construct-a-new-one" approach instead with the constructed queries all being of the same `LTRScoringQuery` base class type. ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java ########## @@ -102,6 +102,26 @@ public LTRScoringModel getScoringModel() { return ltrScoringModel; } + public void setLtrScoringModel(LTRScoringModel ltrScoringModel) { + this.ltrScoringModel = ltrScoringModel; + } + + public void setLtrThreadMgr(LTRThreadModule ltrThreadMgr) { + this.ltrThreadMgr = ltrThreadMgr; + } + + public void setExtractAllFeatures(boolean extractAllFeatures) { + this.extractAllFeatures = extractAllFeatures; + } + + public void setEfi(Map<String, String[]> efi) { + this.efi = efi; + } Review comment: LTRFeatureLoggerTransformerFactory.8 - with a `rerankedQueriesFromContext` + `rerankedQueries` approach in the feature logger transformer the setters here would not be needed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org