amrishlal commented on issue #7004:
URL: 
https://github.com/apache/incubator-pinot/issues/7004#issuecomment-884523925


   I am a bit confused by these two statements:
   > When streaming from Kafka, Pinot currently lacks of a way to allow users 
to uniquely identify messages
   
   and
   
   > where offset > last_recorded_offset
   
   It seems like in the first case, you are looking for a globally unique 
identifier for each row. I am assuming this would involve something like a UUID 
generator that will tack on UUID with each row that is ingested (?) In the 
second case, it seems like you are looking for a "rowid" with the additional 
criteria that it should be monotonically increasing and be comparable.
   
   I am not quite sure if it is possible to do both with reasonable amount of 
effort (i.e generate a globally unique identifier that can is monotonically 
increasing and hence also comparable across all rows of all segments) specially 
when one considers that we commonly replace segments and also do some update 
operations such as UPSERT. Unless I am missing something, maybe it could be 
done with a cluster wide id generation service in Pinot (?). The first (UUID 
generation) can probably be done now at ingestion time using an ingestion 
transform function (?). The second looks very difficult to implement and get 
right (?).
   
   I think we need more clarity on what exactly is being implemented here: 1) 
dynamically generated ROWID over resultset only (for supporting cursors), 2) a 
column that will identify each row with a globally unique identifier (useful 
for partitioning, indexing, etc), 3) ROWID generated for each row at row 
creation time that is globally unique and comparable across all rows and all 
segments and that can be kept up to date with operations such as segment 
replacement, UPSERT, etc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to