[jira] [Commented] (DRILL-8092) Add Auto Pagination to HTTP Storage Plugin

ASF GitHub Bot (Jira) Thu, 06 Jan 2022 09:11:33 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470061#comment-17470061
 ]


ASF GitHub Bot commented on DRILL-8092:
---------------------------------------

cgivre commented on a change in pull request #2414:
URL: https://github.com/apache/drill/pull/2414#discussion_r779707142



##########
File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/paginator/OffsetPaginator.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.http.paginator;
+
+import okhttp3.HttpUrl;
+import okhttp3.HttpUrl.Builder;
+import org.apache.drill.common.exceptions.UserException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class OffsetPaginator extends Paginator {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(OffsetPaginator.class);
+
+  private final int limit;
+  private final String limitField;
+  private final String offsetField;
+  private int offset;
+
+  /**
+   * This class implements the idea of an Offset Paginator. See here for 
complete explanation:
+   * https://nordicapis.com/everything-you-need-to-know-about-api-pagination/
+   * <p>
+   *
+   * @param builder     The okhttp3 URL builder which has the API root URL
+   * @param limit       The limit clause from the sql query
+   * @param maxPageSize The maximum page size the API documentation
+   * @param limitField  The field name which corresponds to the limit field 
from the API
+   * @param offsetField The field name which corresponds to the offset field 
from the API
+   */
+  public OffsetPaginator(Builder builder, int limit, int maxPageSize, String 
limitField, String offsetField) {
+    super(builder, paginationMode.OFFSET, maxPageSize, limit > 0);
+    this.limit = limit;
+    this.limitField = limitField;
+    this.offsetField = offsetField;
+    this.paginatedUrls = buildPaginatedURLs();
+    this.offset = 0;
+
+    // Page size must be greater than zero
+    if (maxPageSize <= 0) {
+      throw UserException
+              .validationError()
+              .message("API limit cannot be zero")
+              .build(logger);
+    }
+  }
+
+  public int getLimit() {
+    return limit;
+  }
+
+  @Override
+  public String next() {
+    if (hasLimit) {
+      return super.next();
+    } else {
+      return generateNextUrl();
+    }
+  }
+
+  @Override
+  public String generateNextUrl() {
+    builder.removeAllEncodedQueryParameters(offsetField);
+    builder.removeAllEncodedQueryParameters(limitField);
+
+    builder.addQueryParameter(offsetField, String.valueOf(offset));
+    builder.addQueryParameter(limitField, String.valueOf(maxPageSize));
+    offset += maxPageSize;
+
+    return builder.build().url().toString();
+  }
+
+
+  /**
+   * Build the paginated URLs.  If the parameters are invalid, return a list 
with the original URL.
+   *
+   * @return List of paginated URLs
+   */
+  @Override
+  public List<HttpUrl> buildPaginatedURLs() {
+    this.paginatedUrls = new ArrayList<>();

Review comment:
       Thanks for the comment.  The rationale for this was as follows:  If you 
are running a query with a limit, you know in advance how many URLs to generate 
and hence you can build the list.  My first pass at this was to simply require 
all paginated API queries to have a `LIMIT` clause.  
   
   Then I realized that if I held to that standard, you couldn't run aggregate 
queries or any other situation where the limit doesn't get pushed down.  So, I 
refactored a bit and made it such that if the limit is known, go with plan A, 
and if not, plan B is to execute the request and when the request returns less 
than the `pageSize`, stop generating new requests.
   
   Why is this important?
   I'm glad you asked.   One option I was thinking about is future work, and 
one possibility for this would be to parallelize the requests.  The only way to 
do this is with a list of URLs.   So, if you're ok with it, I'd like to keep 
this so if/when we decide to parallelize this, we'll need that functionality. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add Auto Pagination to HTTP Storage Plugin
> ------------------------------------------
>
>                 Key: DRILL-8092
>                 URL: https://issues.apache.org/jira/browse/DRILL-8092
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.19.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.0
>
>
> See github



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DRILL-8092) Add Auto Pagination to HTTP Storage Plugin

Reply via email to