[
https://issues.apache.org/jira/browse/SOLR-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Khludnev reassigned SOLR-9668:
--------------------------------------
Assignee: Mikhail Khludnev
> Support cursor paging in SolrEntityProcessor
> --------------------------------------------
>
> Key: SOLR-9668
> URL: https://issues.apache.org/jira/browse/SOLR-9668
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - DataImportHandler
> Reporter: Yegor Kozlov
> Assignee: Mikhail Khludnev
> Priority: Minor
> Labels: dataimportHandler
> Fix For: master (7.0)
>
>
> SolrEntityProcessor paginates using the start and rows parameters which can
> be very inefficient at large offsets. In fact, the current implementation is
> impracticable to import large amounts of data (10M+ documents) because the
> data import rate degrades from 1000docs/second to 10docs/second and the
> import gets stuck.
> This patch introduces support for cursor paging which offers more or less
> predictable performance. In my tests the time to fetch the 1st and 1000th
> pages was about the same and the data import rate was stable throughout the
> entire import.
> To enable cursor paging a user needs to add a "sort" attribute in the entity
> configuration:
> {code}
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
> <document>
> <entity name="se" processor="SolrEntityProcessor"
> query="*:*"
> rows="1000"
> sort="id asc" <!-- turns on cursor paging. Must be a uniqueKey field tie
> breaker -->
> url="http://localhost:8983/solr/collection1">
> </entity>
> </document>
> </dataConfig>
> {code}
> If the "sort" attribute is missing then the default start/rows pagination is
> used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]