[GitHub] [pinot] mrtnhorn opened a new issue, #11050: Using JSON_MATCH inside of IN_SUBQUERY

via GitHub Thu, 06 Jul 2023 15:57:42 -0700


mrtnhorn opened a new issue, #11050:
URL: https://github.com/apache/pinot/issues/11050


   Hello! Unsure if this is a bug or an unsupported feature. We are trying to 
run an aggregation on top of a filtered list of IDs, which we would like to 
gather using `JSON_MATCH` rather than `JSON_EXTRACT_SCALAR`.
   
   Here is the query:
   
   ```
   SELECT annotation_type, annotation, COUNT(*)
   FROM annotations
   WHERE IN_SUBQUERY(
       doc_id,
        '
                SELECT ID_SET(doc_id)
                FROM annotations
                WHERE JSON_MATCH(annotation, '' "$.id" = 
''d0eb419e-c2c4-4815-8085-7a4ea8874dbf'' '')
        '
   ) = 1
   GROUP BY 1, 2
   ORDER BY 3 DESC
   LIMIT 10;
   ```
   
   It results in this error:
   ```
   [
     {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.RuntimeException: Caught 
exception while executing subquery: \n\t\tSELECT ID_SET(doc_id)\n\t\tFROM 
annotations\n\t\tWHERE JSON_MATCH(annotation, ' \"$.id\" = 
'd0eb419e-c2c4-4815-8085-7a4ea8874dbf' ')\n\t"
     }
   ]
   ```
   
   The subquery itself works fine:
   ```
   SELECT ID_SET(doc_id)
   FROM annotations
   WHERE JSON_MATCH(annotation, ' "$.id" = 
''d0eb419e-c2c4-4815-8085-7a4ea8874dbf'' ');
   ```
   
   Thought there might be an issue with escaping single quotes, but if you add 
another set of escaping single quotes around 
`''d0eb419e-c2c4-4815-8085-7a4ea8874dbf''`, you get the same error (in fact, 
the extra single quotes are stripped out in the error message). We thought 
there might be an issue with the double quotes being escaped, but were unable 
to create any variation that worked.
   
   We can successfully use `JSON_EXTRACT_SCALAR`, but it takes several seconds 
to run on O(100k) rows:
   ```
   SELECT annotation_type, annotation, count(*)
   FROM annotations
   WHERE IN_SUBQUERY(
      doc_id,
      '
                SELECT ID_SET(doc_id)
                FROM annotation
                WHERE JSON_EXTRACT_SCALAR(annotation, ''$.id'', ''STRING'', 
''null'') = ''d0eb419e-c2c4-4815-8085-7a4ea8874dbf''
     '
   ) = 1
   GROUP BY 1, 2
   ORDER BY 3 DESC
   LIMIT 10;
   ```
   
   We'd like to utilize `JSON_MATCH` as it runs faster on the subquery than 
`JSON_EXTRACT_SCALAR`.
   
   Are we doing something wrong, is this a bug, or is this an unsupported 
feature? If the last option, is there any plan to add support?
   
   Thank you!
   
   P.S. We are running v0.13.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] mrtnhorn opened a new issue, #11050: Using JSON_MATCH inside of IN_SUBQUERY

Reply via email to