[ 
https://issues.apache.org/jira/browse/SOLR-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved SOLR-9668.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 6.4

> Support cursor paging in SolrEntityProcessor
> --------------------------------------------
>
>                 Key: SOLR-9668
>                 URL: https://issues.apache.org/jira/browse/SOLR-9668
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler
>            Reporter: Yegor Kozlov
>            Assignee: Mikhail Khludnev
>            Priority: Minor
>              Labels: dataimportHandler
>             Fix For: master (7.0), 6.4
>
>         Attachments: SOLR-9668.patch, SOLR-9668.patch
>
>
> SolrEntityProcessor paginates using the start and rows parameters which can 
> be very inefficient at large offsets. In fact, the current implementation  is 
> impracticable to import large amounts of data (10M+ documents) because the 
> data import rate degrades from 1000docs/second to 10docs/second and the 
> import gets stuck.
> This patch introduces support for cursor paging which offers more or less 
> predictable performance. In my tests the time to fetch the 1st and 1000th 
> pages was about the same and the data import rate was stable throughout the 
> entire import. 
> To enable cursor paging a user needs to:
>  * add {{cursorMark='true'}} (!) attribute in the entity configuration;
>  * "sort" attribute in the entity configuration see note about sort at 
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results ;
>  * remove {{timeout}} attribute.
> {code}
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
>   <document>
>     <entity name="se" processor="SolrEntityProcessor" 
>     query="*:*"
>     rows="1000"
>     cursorMark='true'
>     sort="id asc"  
>     url="http://localhost:8983/solr/collection1";>
>     </entity>
>   </document>
> </dataConfig>
> {code}
> If the {{cursorMark}} attribute is missing or is not {{'true'}} then the 
> default start/rows pagination is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to