[jira] [Updated] (SOLR-10117) Big docs and the DocumentCache; umbrella issue

David Smiley (JIRA) Tue, 14 Feb 2017 16:14:05 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-10117:
--------------------------------
    Attachment: SOLR_10117_large_fields.patch

Here's a patch that adds a "large" boolean property to schema fields.  Such 
fields, in conjunction with lazy field loading, will get their own separate 
lazy document.  _This is one piece of a larger puzzle._ 

It would be awesome to _instead_ have large-ness detected dynamically without 
having to declare them as large as I'm doing here but that has some issues.  
First if the lazy field loading wanted to know the size then it has to incur 
the cost of loading it, which defeats the point.  A possible solution I'm in 
favor of (yet might be controversial?) would be Solr's DocumentBuilder 
detecting large string values and then if there is one then adding the name to 
a proposed docValues {{\_largeFields\_}} field.  Then lazy field loading could 
examine the values.  An alternative variation on this is to save it as a stored 
value that comes first in the stored document, since lazy field loading has to 
go to disk for this any way. I actually like that better than using docValues.  
It might even go further and place the largest field value(s) last to benefit 
from a Lucene level optimization I got in a while back.

Notice this LargeFieldsTest uses some testing techniques I want to popularize 
and refine.  There are no new schema/solrconfig files despite that this test 
defines fields, field types, and makes solrconfig changes.  And it doesn't copy 
configs to do this.

> Big docs and the DocumentCache; umbrella issue
> ----------------------------------------------
>
>                 Key: SOLR-10117
>                 URL: https://issues.apache.org/jira/browse/SOLR-10117
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR_10117_large_fields.patch
>
>
> This is an umbrella issue for improved handling of large documents (large 
> stored fields), generally related to the DocumentCache or SolrIndexSearcher's 
> doc() methods.  Highlighting is affected as it's the primary consumer of this 
> data.  "Large" here is multi-megabyte, especially tens even hundreds of 
> megabytes. We'd like to support such users without forcing them to choose 
> between no DocumentCache (bad performance), or having one but hitting OOM due 
> to massive Strings winding up in there.  I've contemplated this for longer 
> than I'd like to admit and it's a complicated issue with difference concerns 
> to balance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-10117) Big docs and the DocumentCache; umbrella issue

Reply via email to