[
https://issues.apache.org/jira/browse/SOLR-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903287#comment-15903287
]
David Smiley commented on SOLR-10117:
-------------------------------------
bq. Would this deduplicate large fields replicated in multiple records?
No. If I were tasked to do that, I might implement a customized
DocValuesFormat that deduplicated per segment (could not dedup at higher tiers)
by using the length as a crude hash and then verifying the dedup by re-reading
the original. Query time would be no overhead; it'd simply share the internal
offset/length pointer.
> Big docs and the DocumentCache; umbrella issue
> ----------------------------------------------
>
> Key: SOLR-10117
> URL: https://issues.apache.org/jira/browse/SOLR-10117
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Attachments: SOLR_10117_large_fields.patch
>
>
> This is an umbrella issue for improved handling of large documents (large
> stored fields), generally related to the DocumentCache or SolrIndexSearcher's
> doc() methods. Highlighting is affected as it's the primary consumer of this
> data. "Large" here is multi-megabyte, especially tens even hundreds of
> megabytes. We'd like to support such users without forcing them to choose
> between no DocumentCache (bad performance), or having one but hitting OOM due
> to massive Strings winding up in there. I've contemplated this for longer
> than I'd like to admit and it's a complicated issue with difference concerns
> to balance.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]