vaibhavk1992 commented on PR #729:
URL: https://github.com/apache/incubator-xtable/pull/729#issuecomment-3369251956

   Below is the summary of the difference between two schemes (Delta vs Kernel) 
also added the what remains the difference between two.
   ### Comparison of Schema Responses: Delta Kernel vs Delta Log
   This document outlines the differences in schema responses when using 
**Delta Kernel** and **Delta Log** APIs to retrieve changes in a Delta table. 
The comparison highlights the structure and format of the responses, providing 
insights into how the two approaches differ.
   #### **Delta Kernel Schema Response**
   When using the `DeltaKernelIncrementalChangesState` class to retrieve 
changes, the response is in the form of a **row of columnar batch type**. Each 
row is represented as an object of the `io.delta.kernel.data.Row` interface, 
which provides methods to access individual fields. The response is 
minimalistic and focuses on the raw data representation.
   ##### **Sample Output**
   1 row is an object ==> 
io.delta.kernel.internal.data.ColumnarBatchRow@20c03e47
   ##### **Key Characteristics**
   1. **Row Representation**: Each row is an instance of `ColumnarBatchRow`, 
which provides methods to access fields like `getLong`, `getString`, 
[etc](https://docs.delta.io/api/latest/java/kernel/io/delta/kernel/data/row).
   2. **Minimal Metadata**: The response contains only the essential fields 
(e.g., `version`, `timestamp`, `commitInfo`).
   3. **Raw Data**: The schema is not enriched with additional metadata or 
actions; it is a direct representation of the data in the columnar batch.
   ##### **Use Case**
   This format is suitable for low-level data processing where the focus is on 
performance and accessing raw data.
   #### **Delta Log Schema Response**
   When using the `DeltaLog.getChanges` method, the response is a **tuple** 
containing the version number and a list of actions. The actions include 
detailed metadata about the changes, such as `CommitInfo` and `AddFile`.
   Sample Output from delta table changes
   (2,
   Vector(
     CommitInfo(None, 2025-08-15 15:00:46.05, None, None, WRITE, Map(mode -> 
Append, partitionBy -> []), None, None, None, Some(1), Some(Serializable), 
Some(true), Some(Map(numFiles -> 1, numOutputRows -> 50, numOutputBytes -> 
10226)), None, None, Some(Apache-Spark/3.4.2 Delta-Lake/2.4.0), 
Some(cf7b1472-4c68-4f89-aa97-c8f16512ecfc)),
     
AddFile(part-00000-e8eeadc8-4e26-46a7-8c61-0bf60e5e7ada-c000.snappy.parquet, 
Map(), 10226, 1755250246045, true, 
{"numRecords":50,"minValues":{"id":51,"firstName":"0WI98","lastName":"08VkW","gender":"Female","birthDate":"2013-02-16T21:18:43.000+05:30","level":"ERROR","date_field":"2025-08-15","timestamp_field":"2025-08-15T15:00:45.884+05:30","double_field":0.018425752795049544,"float_field":0.109567106,"long_field":-8844008067348082419,"record_field":{"nested_int":-2060061976}},"maxValues":{"id":100,"firstName":"xZnER","lastName":"ymLQw","gender":"Male","birthDate":"2023-08-07T15:06:55.000+05:30","level":"WARN","date_field":"2025-08-15","timestamp_field":"2025-08-15T15:00:45.885+05:30","double_field":0.9914942463945434,"float_field":0.9841615,"long_field":8775924211265194460,"record_field":{"nested_int":1923869027}},"nullCount":{"id":0,"firstName":25,"lastName":23,"gender":0,"birthDate":0,"level":0,"boolean_field":28,"date_field":24,"timestamp_field":26,"double_field":25,"float_f
 
ield":28,"long_field":25,"binary_field":32,"primitive_map":22,"record_map":25,"primitive_list":28,"record_list":29,"record_field":{"nested_int":28}}},
 null, null)
   ))
   Key Characteristics
   1. **Rich Metadata**: The response includes detailed metadata such as 
`CommitInfo` (e.g., operation type, timestamp, and user metadata) and `AddFile` 
(e.g., file path, size, and statistics).
   2. **Structured Actions**: Each action is represented as a specific object 
(e.g., `CommitInfo`, `AddFile`), making it easier to interpret the changes.
   3. **Verbose Output**: The response is more verbose, providing a 
comprehensive view of the changes.
   
   This issue is currently in **blocked** state. I raised it with delta team 
quite a few time but no response over it.
   https://delta-users.slack.com/archives/C04TRPG3LHZ/p1758730559515289 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to