[ 
https://issues.apache.org/jira/browse/SOLR-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-9668:
-----------------------------------
    Attachment: SOLR-9668.patch

what about [^SOLR-9668.patch]? 

> Support cursor paging in SolrEntityProcessor
> --------------------------------------------
>
>                 Key: SOLR-9668
>                 URL: https://issues.apache.org/jira/browse/SOLR-9668
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler
>            Reporter: Yegor Kozlov
>            Assignee: Mikhail Khludnev
>            Priority: Minor
>              Labels: dataimportHandler
>             Fix For: master (7.0)
>
>         Attachments: SOLR-9668.patch
>
>
> SolrEntityProcessor paginates using the start and rows parameters which can 
> be very inefficient at large offsets. In fact, the current implementation  is 
> impracticable to import large amounts of data (10M+ documents) because the 
> data import rate degrades from 1000docs/second to 10docs/second and the 
> import gets stuck.
> This patch introduces support for cursor paging which offers more or less 
> predictable performance. In my tests the time to fetch the 1st and 1000th 
> pages was about the same and the data import rate was stable throughout the 
> entire import. 
> To enable cursor paging a user needs to add a "sort" attribute in the entity 
> configuration:
> {code}
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
>   <document>
>     <entity name="se" processor="SolrEntityProcessor" 
>     query="*:*"
>     rows="1000"
>     sort="id asc"  <!-- turns on cursor paging. Must be a uniqueKey field tie 
> breaker --> 
>     url="http://localhost:8983/solr/collection1";>
>     </entity>
>   </document>
> </dataConfig>
> {code}
> If the "sort" attribute is missing then the default start/rows pagination is 
> used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to