[ 
https://issues.apache.org/jira/browse/DRILL-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464891#comment-17464891
 ] 

ASF GitHub Bot commented on DRILL-8092:
---------------------------------------

cgivre opened a new pull request #2414:
URL: https://github.com/apache/drill/pull/2414


   # [DRILL-8092](https://issues.apache.org/jira/browse/DRILL-8092): Add Auto 
Pagination to HTTP Storage Plugin
   
   ## Description
   This PR adds the ability for Drill to access APIs that have some sort of 
pagination.  In a nutshell, let's say an API limits you to 100 records per 
page.  This improvement allows Drill to execute multiple HTTP requests to 
retrieve the complete dataset.
   
   This update works in two ways: with a limit and without.   In the event a 
limit is pushed down, the new paginator object will generate the correct number 
of URLs and BatchReaders, execute the queries and return the results.  
Currently, this is executed in series, but in future work this could be 
parallelized.
   
   In the event a limit is not pushed down, the reader will keep generating 
URLs and retrieving data until the row count of data returned is less than the 
page size.
   
   ## Documentation
   (From README)
   Remote APIs frequently implement some sort of pagination as a way of 
limiting results.  However, if you are performing bulk data analysis, it is 
necessary to reassemble the 
   data into one larger dataset.  Drill's auto-pagination features allow this 
to happen in the background, so that the user will get clean data back.
   
   To use a paginator, you simply have to configure the paginator in the 
connection for the particular API.  
   
   ## Offset Pagination
   Offset Pagination uses commands similar to SQL which has a `LIMIT` and an 
`OFFSET`.  With an offset paginator, let's say you want 200 records and the 
maximum page size is 50 
   records, the offset paginator will break up your query into 4 requests as 
shown below:
   
   * myapi.com?limit=50&offset=0
   * myapi.com?limit=50?offset=50
   * myapi.com?limit=50&offset=100
   * myapi.com?limit=50&offset=150
   
   ### Configuring Offset Pagination
   To configure an offset paginator, simply add the following to the 
configuration for your connection. 
   
   ```json
   "paginator": {
      "limitField": "<limit>",
      "offsetField": "<offset>",
      "maxPageSize": 100,
      "method": "OFFSET"
   }
   ```
   
   ## Page Pagination
   Page pagination is very similar to offset pagination except instead of using 
an `OFFSET` it uses a page number. 
   
   ```json
    "paginator": {
           "pageField": "page",
           "pageSizeField": "per_page",
           "maxPageSize": 100,
           "method": "PAGE"
         }
   ```
   
   ## Testing
   Added unit tests and tested manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add Auto Pagination to HTTP Storage Plugin
> ------------------------------------------
>
>                 Key: DRILL-8092
>                 URL: https://issues.apache.org/jira/browse/DRILL-8092
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.19.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.0
>
>
> See github



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to