[ 
https://issues.apache.org/jira/browse/SOLR-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903287#comment-15903287
 ] 

David Smiley commented on SOLR-10117:
-------------------------------------

bq. Would this deduplicate large fields replicated in multiple records?

No.  If I were tasked to do that, I might implement a customized 
DocValuesFormat that deduplicated per segment (could not dedup at higher tiers) 
by using the length as a crude hash and then verifying the dedup by re-reading 
the original.  Query time would be no overhead; it'd simply share the internal 
offset/length pointer.

> Big docs and the DocumentCache; umbrella issue
> ----------------------------------------------
>
>                 Key: SOLR-10117
>                 URL: https://issues.apache.org/jira/browse/SOLR-10117
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR_10117_large_fields.patch
>
>
> This is an umbrella issue for improved handling of large documents (large 
> stored fields), generally related to the DocumentCache or SolrIndexSearcher's 
> doc() methods.  Highlighting is affected as it's the primary consumer of this 
> data.  "Large" here is multi-megabyte, especially tens even hundreds of 
> megabytes. We'd like to support such users without forcing them to choose 
> between no DocumentCache (bad performance), or having one but hitting OOM due 
> to massive Strings winding up in there.  I've contemplated this for longer 
> than I'd like to admit and it's a complicated issue with difference concerns 
> to balance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to