[ https://issues.apache.org/jira/browse/SOLR-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903287#comment-15903287 ]
David Smiley commented on SOLR-10117: ------------------------------------- bq. Would this deduplicate large fields replicated in multiple records? No. If I were tasked to do that, I might implement a customized DocValuesFormat that deduplicated per segment (could not dedup at higher tiers) by using the length as a crude hash and then verifying the dedup by re-reading the original. Query time would be no overhead; it'd simply share the internal offset/length pointer. > Big docs and the DocumentCache; umbrella issue > ---------------------------------------------- > > Key: SOLR-10117 > URL: https://issues.apache.org/jira/browse/SOLR-10117 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: David Smiley > Assignee: David Smiley > Attachments: SOLR_10117_large_fields.patch > > > This is an umbrella issue for improved handling of large documents (large > stored fields), generally related to the DocumentCache or SolrIndexSearcher's > doc() methods. Highlighting is affected as it's the primary consumer of this > data. "Large" here is multi-megabyte, especially tens even hundreds of > megabytes. We'd like to support such users without forcing them to choose > between no DocumentCache (bad performance), or having one but hitting OOM due > to massive Strings winding up in there. I've contemplated this for longer > than I'd like to admit and it's a complicated issue with difference concerns > to balance. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org