[
https://issues.apache.org/jira/browse/SOLR-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-10117:
--------------------------------
Attachment: SOLR_10117_large_fields.patch
Here's a patch that adds a "large" boolean property to schema fields. Such
fields, in conjunction with lazy field loading, will get their own separate
lazy document. _This is one piece of a larger puzzle._
It would be awesome to _instead_ have large-ness detected dynamically without
having to declare them as large as I'm doing here but that has some issues.
First if the lazy field loading wanted to know the size then it has to incur
the cost of loading it, which defeats the point. A possible solution I'm in
favor of (yet might be controversial?) would be Solr's DocumentBuilder
detecting large string values and then if there is one then adding the name to
a proposed docValues {{\_largeFields\_}} field. Then lazy field loading could
examine the values. An alternative variation on this is to save it as a stored
value that comes first in the stored document, since lazy field loading has to
go to disk for this any way. I actually like that better than using docValues.
It might even go further and place the largest field value(s) last to benefit
from a Lucene level optimization I got in a while back.
Notice this LargeFieldsTest uses some testing techniques I want to popularize
and refine. There are no new schema/solrconfig files despite that this test
defines fields, field types, and makes solrconfig changes. And it doesn't copy
configs to do this.
> Big docs and the DocumentCache; umbrella issue
> ----------------------------------------------
>
> Key: SOLR-10117
> URL: https://issues.apache.org/jira/browse/SOLR-10117
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Attachments: SOLR_10117_large_fields.patch
>
>
> This is an umbrella issue for improved handling of large documents (large
> stored fields), generally related to the DocumentCache or SolrIndexSearcher's
> doc() methods. Highlighting is affected as it's the primary consumer of this
> data. "Large" here is multi-megabyte, especially tens even hundreds of
> megabytes. We'd like to support such users without forcing them to choose
> between no DocumentCache (bad performance), or having one but hitting OOM due
> to massive Strings winding up in there. I've contemplated this for longer
> than I'd like to admit and it's a complicated issue with difference concerns
> to balance.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]