[ https://issues.apache.org/jira/browse/SOLR-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated SOLR-9668: ----------------------------------- Attachment: SOLR-9668.patch what about [^SOLR-9668.patch]? > Support cursor paging in SolrEntityProcessor > -------------------------------------------- > > Key: SOLR-9668 > URL: https://issues.apache.org/jira/browse/SOLR-9668 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler > Reporter: Yegor Kozlov > Assignee: Mikhail Khludnev > Priority: Minor > Labels: dataimportHandler > Fix For: master (7.0) > > Attachments: SOLR-9668.patch > > > SolrEntityProcessor paginates using the start and rows parameters which can > be very inefficient at large offsets. In fact, the current implementation is > impracticable to import large amounts of data (10M+ documents) because the > data import rate degrades from 1000docs/second to 10docs/second and the > import gets stuck. > This patch introduces support for cursor paging which offers more or less > predictable performance. In my tests the time to fetch the 1st and 1000th > pages was about the same and the data import rate was stable throughout the > entire import. > To enable cursor paging a user needs to add a "sort" attribute in the entity > configuration: > {code} > <?xml version="1.0" encoding="UTF-8" ?> > <dataConfig> > <document> > <entity name="se" processor="SolrEntityProcessor" > query="*:*" > rows="1000" > sort="id asc" <!-- turns on cursor paging. Must be a uniqueKey field tie > breaker --> > url="http://localhost:8983/solr/collection1"> > </entity> > </document> > </dataConfig> > {code} > If the "sort" attribute is missing then the default start/rows pagination is > used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org