vaibhavk1992 commented on PR #729: URL: https://github.com/apache/incubator-xtable/pull/729#issuecomment-3369251956
Below is the summary of the difference between two schemes (Delta vs Kernel) also added the what remains the difference between two. ### Comparison of Schema Responses: Delta Kernel vs Delta Log This document outlines the differences in schema responses when using **Delta Kernel** and **Delta Log** APIs to retrieve changes in a Delta table. The comparison highlights the structure and format of the responses, providing insights into how the two approaches differ. #### **Delta Kernel Schema Response** When using the `DeltaKernelIncrementalChangesState` class to retrieve changes, the response is in the form of a **row of columnar batch type**. Each row is represented as an object of the `io.delta.kernel.data.Row` interface, which provides methods to access individual fields. The response is minimalistic and focuses on the raw data representation. ##### **Sample Output** 1 row is an object ==> io.delta.kernel.internal.data.ColumnarBatchRow@20c03e47 ##### **Key Characteristics** 1. **Row Representation**: Each row is an instance of `ColumnarBatchRow`, which provides methods to access fields like `getLong`, `getString`, [etc](https://docs.delta.io/api/latest/java/kernel/io/delta/kernel/data/row). 2. **Minimal Metadata**: The response contains only the essential fields (e.g., `version`, `timestamp`, `commitInfo`). 3. **Raw Data**: The schema is not enriched with additional metadata or actions; it is a direct representation of the data in the columnar batch. ##### **Use Case** This format is suitable for low-level data processing where the focus is on performance and accessing raw data. #### **Delta Log Schema Response** When using the `DeltaLog.getChanges` method, the response is a **tuple** containing the version number and a list of actions. The actions include detailed metadata about the changes, such as `CommitInfo` and `AddFile`. Sample Output from delta table changes (2, Vector( CommitInfo(None, 2025-08-15 15:00:46.05, None, None, WRITE, Map(mode -> Append, partitionBy -> []), None, None, None, Some(1), Some(Serializable), Some(true), Some(Map(numFiles -> 1, numOutputRows -> 50, numOutputBytes -> 10226)), None, None, Some(Apache-Spark/3.4.2 Delta-Lake/2.4.0), Some(cf7b1472-4c68-4f89-aa97-c8f16512ecfc)), AddFile(part-00000-e8eeadc8-4e26-46a7-8c61-0bf60e5e7ada-c000.snappy.parquet, Map(), 10226, 1755250246045, true, {"numRecords":50,"minValues":{"id":51,"firstName":"0WI98","lastName":"08VkW","gender":"Female","birthDate":"2013-02-16T21:18:43.000+05:30","level":"ERROR","date_field":"2025-08-15","timestamp_field":"2025-08-15T15:00:45.884+05:30","double_field":0.018425752795049544,"float_field":0.109567106,"long_field":-8844008067348082419,"record_field":{"nested_int":-2060061976}},"maxValues":{"id":100,"firstName":"xZnER","lastName":"ymLQw","gender":"Male","birthDate":"2023-08-07T15:06:55.000+05:30","level":"WARN","date_field":"2025-08-15","timestamp_field":"2025-08-15T15:00:45.885+05:30","double_field":0.9914942463945434,"float_field":0.9841615,"long_field":8775924211265194460,"record_field":{"nested_int":1923869027}},"nullCount":{"id":0,"firstName":25,"lastName":23,"gender":0,"birthDate":0,"level":0,"boolean_field":28,"date_field":24,"timestamp_field":26,"double_field":25,"float_f ield":28,"long_field":25,"binary_field":32,"primitive_map":22,"record_map":25,"primitive_list":28,"record_list":29,"record_field":{"nested_int":28}}}, null, null) )) Key Characteristics 1. **Rich Metadata**: The response includes detailed metadata such as `CommitInfo` (e.g., operation type, timestamp, and user metadata) and `AddFile` (e.g., file path, size, and statistics). 2. **Structured Actions**: Each action is represented as a specific object (e.g., `CommitInfo`, `AddFile`), making it easier to interpret the changes. 3. **Verbose Output**: The response is more verbose, providing a comprehensive view of the changes. This issue is currently in **blocked** state. I raised it with delta team quite a few time but no response over it. https://delta-users.slack.com/archives/C04TRPG3LHZ/p1758730559515289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
