Re: [PR] SOLR-17022: Support for glob patterns for fields in Export handler, Stream handler and with SelectStream streaming expression [solr]

via GitHub Wed, 11 Oct 2023 07:55:54 -0700


justinrsweeney commented on code in PR #1996:
URL: https://github.com/apache/solr/pull/1996#discussion_r1355153153



##########
solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java:
##########
@@ -844,4 +844,61 @@ public String getMessage() {
       return "Early Client Disconnect";
     }
   }
+
+  /**
+   * Creates a complete field list using the provided field list by expanding 
any glob patterns into
+   * field names
+   *
+   * @param fields the original set of fields provided
+   * @param searcher an index searcher to access schema info
+   * @return a complete list of fields included any fields matching glob 
patterns
+   * @throws IOException if a provided field does not exist or cannot be 
retrieved from the schema
+   *     info
+   */
+  private List<SchemaField> expandFieldList(String[] fields, SolrIndexSearcher 
searcher)
+      throws IOException {
+    List<SchemaField> expandedFields = new ArrayList<>(fields.length);
+    Set<String> fieldsProcessed = new HashSet<>();
+    for (String field : fields) {
+      try {
+        if (field.contains("*")) {
+          getGlobFields(field, searcher, fieldsProcessed, expandedFields);
+        } else {
+          if (fieldsProcessed.add(field)) {
+            expandedFields.add(searcher.getSchema().getField(field));
+          }
+        }
+      } catch (Exception e) {
+        throw new IOException(e);
+      }
+    }
+
+    return expandedFields;
+  }
+
+  /**
+   * Create a list of schema fields that match a given glob pattern
+   *
+   * @param fieldPattern the glob pattern to match
+   * @param searcher an index search to access schema info
+   * @param fieldsProcessed the set of field names already processed to avoid 
duplicating
+   * @param expandedFields the list of fields to add expanded field names into
+   */
+  private void getGlobFields(
+      String fieldPattern,
+      SolrIndexSearcher searcher,
+      Set<String> fieldsProcessed,
+      List<SchemaField> expandedFields) {
+    for (FieldInfo fi : searcher.getFieldInfos()) {
+      if (GlobPatternUtil.matches(fieldPattern, fi.getName())) {
+        SchemaField schemaField = searcher.getSchema().getField(fi.getName());
+        if (fieldsProcessed.add(fi.getName())
+            && schemaField.hasDocValues()
+            && (!(schemaField.getType() instanceof SortableTextField)

Review Comment:
   Wanted to highlight this because @magibney and I were discussing this 
previously. The way this is implemented for the Export handler here actually 
deviates from how glob patterns work for the Select handler.
   
   The Select handler will not match any fields where 
`useDocValuesAsStored=false` for glob patterns, those fields must be explicitly 
provided in the field list.
   
   For the purposes of Export I did not follow that pattern and instead will 
match any fields that have docvalues enabled with the exception of 
SortableTextField which require `useDocValuesAsStored=true` to be used in 
Export in other places in the code.
   
   My reasoning here is that there is a performance hit for getting non-stored, 
docvalues enabled fields in the Select handler that doesn't seem to be to be as 
impactful in the Export handler.
   
   Open to other opinions on this topic, should we:
   1. Have glob patterns for fields in Export handler return any fields that 
are able to be exported
   2. Match the behavior in the Select handler and only return fields that 
match the pattern, have docvalues enabled AND have `useDocValuesAsStored=true`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-17022: Support for glob patterns for fields in Export handler, Stream handler and with SelectStream streaming expression [solr]

Reply via email to