Ashish M G created HUDI-1267:
--------------------------------

             Summary: Additional Metadata Details for Hudi Transactions
                 Key: HUDI-1267
                 URL: https://issues.apache.org/jira/browse/HUDI-1267
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Ashish M G


Whenever following scenarios happen :
 # Custom Datasource ( Kafka for instance ) -> Hudi Table
 # Hudi -> Hudi Table
 # s3 -> Hudi Table

Following metadata need to be captured :
 # Table Level Metadata

 ** Operation name ( record level ) like Upsert, Insert etc for last operation 
performed on the row
 # Transaction Level Metadata ( This will be logged on Hudi Level and not Table 
Level )
 ** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc )
 ** Target Hudi Table Name
 ** Last transaction time ( last commit time )

Basically , point (1) collects all details on table level  and point (2) 
collects all the transactions happened on Hudi Level

Point(1) would be just a column addition for operation type

Eg for Point (2) :  Suppose we had an ingestion from Kafka topic 'A' to Hudi 
table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) 
through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be :

 
|Source|Timestamp|Transaction Type|Target|
|Kafka - 'A'|XXXXXX|UPSERT|ingest_kafka|
|RDBMS - 'tableA'|XXXXXX|INSERT|RDBMSingest|

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to