Peter Turcsanyi created NIFI-14291:
--------------------------------------

             Summary: Refactor recursive folder listing in ListGoogleDrive
                 Key: NIFI-14291
                 URL: https://issues.apache.org/jira/browse/NIFI-14291
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Peter Turcsanyi
            Assignee: Peter Turcsanyi


Recursive folder listing in ListGoogleDrive is implemented in two phases 
currently:
 # traverse the folder structure from the base folder and collect all the 
subfolder ids recursively (files are skipped in this step)
 # execute an overall query with all the folder ids (like "folder_id_1 in 
parents or folder_id_2 in parents or ...")

The composite query may lead to the following errors:
 * the query is too big/complex and fails 
[https://stackoverflow.com/questions/29738020/google-drive-api-limit-on-search-query-parameter]
 * in rare cases, some files are just skipped silently and not returned by the 
Drive service for some reason 
[https://stackoverflow.com/questions/60131503/is-it-possible-to-query-for-multiple-folders-parents-using-googles-drive-api]

Refactor the recursive listing to traverse and query the folders one by one 
(like other List processors do). 

{code:java}
2025-02-24 11:44:57,436 ERROR [Timer-Driven Process Thread-5] 
o.a.n.p.gcp.drive.ListGoogleDrive 
ListGoogleDrive[id=c96787d9-bdbc-3511-db04-6529ae078f7f] Failed to perform 
listing on remote host due to 400 Bad Request
POST https://www.googleapis.com/drive/v3/files
{
  "code": 400,
  "errors": [
    {
      "domain": "global",
      "location": "q",
      "locationType": "parameter",
      "message": "The query is too complex.",
      "reason": "invalid"
    }
  ],
  "message": "The query is too complex."
}
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
POST https://www.googleapis.com/drive/v3/files
{
  "code": 400,
  "errors": [
    {
      "domain": "global",
      "location": "q",
      "locationType": "parameter",
      "message": "The query is too complex.",
      "reason": "invalid"
    }
  ],
  "message": "The query is too complex."
}
        at 
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$3.interceptResponse(AbstractGoogleClientRequest.java:479)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:565)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:506)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:616)
        at 
org.apache.nifi.processors.gcp.drive.ListGoogleDrive.performListing(ListGoogleDrive.java:278)
        at 
org.apache.nifi.processor.util.list.AbstractListProcessor.listByNoTracking(AbstractListProcessor.java:460)
        at 
org.apache.nifi.processor.util.list.AbstractListProcessor.onTrigger(AbstractListProcessor.java:426)
        at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1272)
        at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:244)
        at 
org.apache.nifi.controller.scheduling.AbstractTimeBasedSchedulingAgent.lambda$doScheduleOnce$0(AbstractTimeBasedSchedulingAgent.java:59)
        at org.apache.nifi.engine.FlowEngine.lambda$wrap$1(FlowEngine.java:105)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to