kotman12 opened a new pull request, #4053:
URL: https://github.com/apache/solr/pull/4053

   https://issues.apache.org/jira/browse/SOLR-18071
   
   # Description
   
   Adds support for exporting stored-only fields (fields without docValues) in 
the `/export` request handler via a new `includeStoredFields` parameter. 
Previously, all fields in the field list (fl) were required to have docValues 
enabled. This change allows users to include stored fields that don't have 
docValues, which can be useful, i.e. when exporting fields which don't support 
docValues or trying to export data that has already been indexed without DVs.
   
   # Solution
   
   If fl explicitly names a stored-only field and includeStoredFields is not 
enabled, the request fails with a 400 and a hint to add 
includeStoredFields=true. For glob patterns (e.g., fl=\* or fl=intdv,\*), 
stored-only fields are skipped unless includeStoredFields=true, to preserve 
backward compatibility. The current implementation fetches from `StoredFields` 
DV-enabled fields when some stored fields have _already_ been requested. This 
avoids DV-lookup for a field, which makes sense since we have to parse the 
StoredFields anyway. My (somewhat limited) benchmarks appear to corroborate 
that this is the best choice for performance.
   
   A quirky thing about this implementation is that the very much internal 
`FieldWriter` API was changed to support more than one field. This makes it 
more interchangeable with the existing `StoredFieldVisitor` interface, which 
assumes one visitor per many fields. I landed on this rather than creating an 
adapter to bridge the two as it appeared to be simpler. It's worth stressing 
again that the `FieldWriter` is very much internal to the export package and 
the boolean it returned was effectively discarded (the local `fieldIndex` it 
drives isn't even used anywhere). It could be argued that `FieldWriter::write` 
could be `void` and I'd also be open to such a change.
   
   # Tests
   
   - Explicit stored-only field export succeeds with includeStoredFields=true 
(single-valued and multi-valued).
   - Explicit stored-only field export fails without the parameter and includes 
the includeStoredFields=true hint and field name.
   - Glob fl skips stored-only fields without the parameter (request succeeds, 
stored-only fields not present).
   - Glob fl includes stored-only fields with the parameter.
   - Coverage for stored field types: string, int, long, float, double, 
boolean, date.
   
   Also have some performance comparisons of exporting vs not exporting stored 
fields:
   
[stored-fields-export-writer-1k-doc-benchmark.txt](https://github.com/user-attachments/files/24655605/stored-fields-export-writer-1k-doc-benchmark.txt)
   
[stored-fields-export-writer-140k-doc-benchmark.txt](https://github.com/user-attachments/files/24655606/stored-fields-export-writer-140k-doc-benchmark.txt)
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended, not available for 
branches on forks living under an organisation)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [x] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   - [x] I have added a [changelog 
entry](https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc) for my 
change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to