Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

via GitHub Wed, 04 Mar 2026 20:06:57 -0800


danny0405 commented on code in PR #18276:
URL: https://github.com/apache/hudi/pull/18276#discussion_r2887467537



##########
rfc/rfc-98/rfc-98.md:
##########
@@ -52,25 +53,240 @@ The current implementation of Spark Datasource V2 
integration is presented in th
 
 ## Implementation
 
-<!--  -->
+The approach is hybrid: DSv2 for reads, DSv1 fallback for writes 
(`V2TableWithV1Fallback`).
+
+Overall proposed architecture for this hybrid approach is shown in the 
following schema:
+
+![Proposed approach with hybrid V1 write and V2 
read](integration_with_DSv2_read.jpg)
+
+### DataFrame API
+
+A new SPI short name `"hudi_v2"` activates the DSv2 path for reading using 
Spark DataFrame API. 
+The existing `"hudi"` path remains unchanged.
+
+<table>
+<tr>
+<th>Operation</th>
+<th>Current implementation</th>
+<th>Additional functionality proposed in this RFC</th>
+</tr>
+<tr>
+<td>Write</td>
+<td>
+<pre>
+df.write.format("hudi").mode(...).save(path)
+        v
+BaseDefaultSource (V1) -> DefaultSource
+        v
+CreatableRelationProvider.createRelation(...)
+        v
+HoodieSparkSqlWriter.write(...)
+        v
+SparkRDDWriteClient -> upsert/insert/bulk_insert
+</pre>
+</td>
+<td>
+<pre>
+df.write.format("hudi_v2").mode(...).save(path)

Review Comment:
   this is not user firiendly to users though, and might bring in more 
compatibility/migration burdens in the long term.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

Reply via email to