alduleacristi opened a new issue, #18152:
URL: https://github.com/apache/druid/issues/18152

   I'm working with Apache Druid and have introduced a second timestamp column 
called ingestionTimestamp to support a deduplication job. Additionally, I have 
a column named tags, which is a multi-value VARCHAR column.
   
   The deduplication is performed using an MSQ (Multi-Stage Query) like the 
following:
   
       REPLACE INTO "target-datasource" 
       OVERWRITE 
   
       SELECT 
           __time,
           LATEST_BY("entityId", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS 
"entityId",
           LATEST_BY("entityName", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) 
AS "entityName",
           LATEST_BY("tagSetA", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS 
"tagSetA",
           LATEST_BY("tagSetB", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS 
"tagSetB",
           MAX("ingestionTimestamp") AS ingestionTimestamp
       FROM "target-datasource"
       GROUP BY 
           __time, 
           "entityUID"
       PARTITIONED BY 'P1M';
   
   **Problem:**
   After running this query, the tags-like columns (tagSetA, tagSetB) are no 
longer in a multi-value format. This breaks downstream queries that rely on the 
multi-value nature of these columns.
   
   **My understanding:**
   MSQ might not support preserving multi-value columns directly, especially 
when using functions like LATEST_BY.
   
   **Question:**
   How can I run this kind of deduplication query while preserving the 
multi-value format of these columns? Is there a recommended approach or 
workaround in Druid to handle this scenario?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to