amrishlal commented on pull request #8413:
URL: https://github.com/apache/pinot/pull/8413#issuecomment-1079489100


   > Let's hold a little bit on merging this and have some high level 
discussion first.
   > 
   > We intentionally reject ingestion transform with same input and output 
column because it is not idempotent, and can cause unexpected behavior if by 
any chance the same record is transformed twice. Also, in certain scenarios, 
the input data might already have the final column generated, and we just skip 
the transform. I would be super careful on this change because we need to 
ensure the record is never transformed twice. Another concern is that if the 
ingestion transform changes, there is no way to re-generate the derived column 
because the original values are already changed. IMO, loose this restriction 
can easily cause unexpected behavior, and might not worth it.
   
   The problem that we are running into is that for GDPR etc., we need to be 
able to purge records from a segment based on values of a particular field and 
if we change the name of the field that is being ingested into Pinot, then we 
loose information that column 'x' in the Pinot table actually came from field 
'y' in the Kafka event / avro schema and hence cannot purge records 
automatically based on orginal avro schema field name 'y' in minion.
   
   Definitely open to suggestions and discussion, but my understanding is that 
ingestion transform functions are applied only during ingestion only where the 
original field is in kafka/avro and the transformed value goes into Pinot 
column, so this should be safe right? If you have any particular usecase that 
may not be safe I can try them out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to