cgivre opened a new pull request #2414: URL: https://github.com/apache/drill/pull/2414
# [DRILL-8092](https://issues.apache.org/jira/browse/DRILL-8092): Add Auto Pagination to HTTP Storage Plugin ## Description This PR adds the ability for Drill to access APIs that have some sort of pagination. In a nutshell, let's say an API limits you to 100 records per page. This improvement allows Drill to execute multiple HTTP requests to retrieve the complete dataset. This update works in two ways: with a limit and without. In the event a limit is pushed down, the new paginator object will generate the correct number of URLs and BatchReaders, execute the queries and return the results. Currently, this is executed in series, but in future work this could be parallelized. In the event a limit is not pushed down, the reader will keep generating URLs and retrieving data until the row count of data returned is less than the page size. ## Documentation (From README) Remote APIs frequently implement some sort of pagination as a way of limiting results. However, if you are performing bulk data analysis, it is necessary to reassemble the data into one larger dataset. Drill's auto-pagination features allow this to happen in the background, so that the user will get clean data back. To use a paginator, you simply have to configure the paginator in the connection for the particular API. ## Offset Pagination Offset Pagination uses commands similar to SQL which has a `LIMIT` and an `OFFSET`. With an offset paginator, let's say you want 200 records and the maximum page size is 50 records, the offset paginator will break up your query into 4 requests as shown below: * myapi.com?limit=50&offset=0 * myapi.com?limit=50?offset=50 * myapi.com?limit=50&offset=100 * myapi.com?limit=50&offset=150 ### Configuring Offset Pagination To configure an offset paginator, simply add the following to the configuration for your connection. ```json "paginator": { "limitField": "<limit>", "offsetField": "<offset>", "maxPageSize": 100, "method": "OFFSET" } ``` ## Page Pagination Page pagination is very similar to offset pagination except instead of using an `OFFSET` it uses a page number. ```json "paginator": { "pageField": "page", "pageSizeField": "per_page", "maxPageSize": 100, "method": "PAGE" } ``` ## Testing Added unit tests and tested manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
