Ashish M G created HUDI-1267:
--------------------------------
Summary: Additional Metadata Details for Hudi Transactions
Key: HUDI-1267
URL: https://issues.apache.org/jira/browse/HUDI-1267
Project: Apache Hudi
Issue Type: Improvement
Reporter: Ashish M G
Whenever following scenarios happen :
# Custom Datasource ( Kafka for instance ) -> Hudi Table
# Hudi -> Hudi Table
# s3 -> Hudi Table
Following metadata need to be captured :
# Table Level Metadata
** Operation name ( record level ) like Upsert, Insert etc for last operation
performed on the row
# Transaction Level Metadata ( This will be logged on Hudi Level and not Table
Level )
** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc )
** Target Hudi Table Name
** Last transaction time ( last commit time )
Basically , point (1) collects all details on table level and point (2)
collects all the transactions happened on Hudi Level
Point(1) would be just a column addition for operation type
Eg for Point (2) : Suppose we had an ingestion from Kafka topic 'A' to Hudi
table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' )
through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be :
|Source|Timestamp|Transaction Type|Target|
|Kafka - 'A'|XXXXXX|UPSERT|ingest_kafka|
|RDBMS - 'tableA'|XXXXXX|INSERT|RDBMSingest|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)