kotman12 opened a new pull request, #4053: URL: https://github.com/apache/solr/pull/4053
https://issues.apache.org/jira/browse/SOLR-18071 # Description Adds support for exporting stored-only fields (fields without docValues) in the `/export` request handler via a new `includeStoredFields` parameter. Previously, all fields in the field list (fl) were required to have docValues enabled. This change allows users to include stored fields that don't have docValues, which can be useful, i.e. when exporting fields which don't support docValues or trying to export data that has already been indexed without DVs. # Solution If fl explicitly names a stored-only field and includeStoredFields is not enabled, the request fails with a 400 and a hint to add includeStoredFields=true. For glob patterns (e.g., fl=\* or fl=intdv,\*), stored-only fields are skipped unless includeStoredFields=true, to preserve backward compatibility. The current implementation fetches from `StoredFields` DV-enabled fields when some stored fields have _already_ been requested. This avoids DV-lookup for a field, which makes sense since we have to parse the StoredFields anyway. My (somewhat limited) benchmarks appear to corroborate that this is the best choice for performance. A quirky thing about this implementation is that the very much internal `FieldWriter` API was changed to support more than one field. This makes it more interchangeable with the existing `StoredFieldVisitor` interface, which assumes one visitor per many fields. I landed on this rather than creating an adapter to bridge the two as it appeared to be simpler. It's worth stressing again that the `FieldWriter` is very much internal to the export package and the boolean it returned was effectively discarded (the local `fieldIndex` it drives isn't even used anywhere). It could be argued that `FieldWriter::write` could be `void` and I'd also be open to such a change. # Tests - Explicit stored-only field export succeeds with includeStoredFields=true (single-valued and multi-valued). - Explicit stored-only field export fails without the parameter and includes the includeStoredFields=true hint and field name. - Glob fl skips stored-only fields without the parameter (request succeeds, stored-only fields not present). - Glob fl includes stored-only fields with the parameter. - Coverage for stored field types: string, int, long, float, double, boolean, date. Also have some performance comparisons of exporting vs not exporting stored fields: [stored-fields-export-writer-1k-doc-benchmark.txt](https://github.com/user-attachments/files/24655605/stored-fields-export-writer-1k-doc-benchmark.txt) [stored-fields-export-writer-140k-doc-benchmark.txt](https://github.com/user-attachments/files/24655606/stored-fields-export-writer-140k-doc-benchmark.txt) # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [x] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) - [x] I have added a [changelog entry](https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc) for my change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
