[
https://issues.apache.org/jira/browse/SOLR-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788126#comment-15788126
]
ASF subversion and git services commented on SOLR-9668:
-------------------------------------------------------
Commit cc862d8e67f32d5447599d265f5d126541ed92c9 in lucene-solr's branch
refs/heads/master from [~mkhludnev]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc862d8 ]
SOLR-9668: introduce cursorMark='true' for SolrEntityProcessor
> Support cursor paging in SolrEntityProcessor
> --------------------------------------------
>
> Key: SOLR-9668
> URL: https://issues.apache.org/jira/browse/SOLR-9668
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - DataImportHandler
> Reporter: Yegor Kozlov
> Assignee: Mikhail Khludnev
> Priority: Minor
> Labels: dataimportHandler
> Fix For: master (7.0)
>
> Attachments: SOLR-9668.patch, SOLR-9668.patch
>
>
> SolrEntityProcessor paginates using the start and rows parameters which can
> be very inefficient at large offsets. In fact, the current implementation is
> impracticable to import large amounts of data (10M+ documents) because the
> data import rate degrades from 1000docs/second to 10docs/second and the
> import gets stuck.
> This patch introduces support for cursor paging which offers more or less
> predictable performance. In my tests the time to fetch the 1st and 1000th
> pages was about the same and the data import rate was stable throughout the
> entire import.
> To enable cursor paging a user needs to:
> * add {{cursorMark='true'}} (!) attribute in the entity configuration;
> * "sort" attribute in the entity configuration see note about sort at
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results ;
> * remove {{timeout}} attribute.
> {code}
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
> <document>
> <entity name="se" processor="SolrEntityProcessor"
> query="*:*"
> rows="1000"
> cursorMark='true'
> sort="id asc"
> url="http://localhost:8983/solr/collection1">
> </entity>
> </document>
> </dataConfig>
> {code}
> If the {{cursorMark}} attribute is missing or is not {{'true'}} then the
> default start/rows pagination is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]