(spark-connect-swift) branch main updated: [SPARK-51809] Support `offset` in `DataFrame`

dongjoon Tue, 15 Apr 2025 15:59:09 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-swift.git



The following commit(s) were added to refs/heads/main by this push:
     new 21724f5  [SPARK-51809] Support `offset` in `DataFrame`
21724f5 is described below

commit 21724f577a686667779677aa7b0dfbf7a7fd6ca1
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Wed Apr 16 07:58:55 2025 +0900

    [SPARK-51809] Support `offset` in `DataFrame`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to support `offset` API in `DataFrame`.
    
    ### Why are the changes needed?
    
    For feature parity.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this is a new addition.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #62 from dongjoon-hyun/SPARK-51809.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 Sources/SparkConnect/DataFrame.swift          |  7 +++++++
 Sources/SparkConnect/SparkConnectClient.swift | 11 +++++++++++
 Tests/SparkConnectTests/DataFrameTests.swift  | 10 ++++++++++
 3 files changed, 28 insertions(+)

diff --git a/Sources/SparkConnect/DataFrame.swift 
b/Sources/SparkConnect/DataFrame.swift
index 0c93234..be5c5e7 100644
--- a/Sources/SparkConnect/DataFrame.swift
+++ b/Sources/SparkConnect/DataFrame.swift
@@ -330,6 +330,13 @@ public actor DataFrame: Sendable {
     return DataFrame(spark: self.spark, plan: 
SparkConnectClient.getLimit(self.plan.root, n))
   }
 
+  /// Returns a new Dataset by skipping the first `n` rows.
+  /// - Parameter n: Number of rows to skip.
+  /// - Returns: A subset of the rows
+  public func offset(_ n: Int32) -> DataFrame {
+    return DataFrame(spark: self.spark, plan: 
SparkConnectClient.getOffset(self.plan.root, n))
+  }
+
   /// Returns a new ``Dataset`` by sampling a fraction of rows, using a 
user-supplied seed.
   /// - Parameters:
   ///   - withReplacement: Sample with replacement or not.
diff --git a/Sources/SparkConnect/SparkConnectClient.swift 
b/Sources/SparkConnect/SparkConnectClient.swift
index d76f533..904e76e 100644
--- a/Sources/SparkConnect/SparkConnectClient.swift
+++ b/Sources/SparkConnect/SparkConnectClient.swift
@@ -397,6 +397,17 @@ public actor SparkConnectClient {
     return plan
   }
 
+  static func getOffset(_ child: Relation, _ n: Int32) -> Plan {
+    var offset = Spark_Connect_Offset()
+    offset.input = child
+    offset.offset = n
+    var relation = Relation()
+    relation.offset = offset
+    var plan = Plan()
+    plan.opType = .root(relation)
+    return plan
+  }
+
   static func getSample(_ child: Relation, _ withReplacement: Bool, _ 
fraction: Double, _ seed: Int64) -> Plan {
     var sample = Sample()
     sample.input = child
diff --git a/Tests/SparkConnectTests/DataFrameTests.swift 
b/Tests/SparkConnectTests/DataFrameTests.swift
index aee0c93..b9c927d 100644
--- a/Tests/SparkConnectTests/DataFrameTests.swift
+++ b/Tests/SparkConnectTests/DataFrameTests.swift
@@ -218,6 +218,16 @@ struct DataFrameTests {
     await spark.stop()
   }
 
+  @Test
+  func offset() async throws {
+    let spark = try await SparkSession.builder.getOrCreate()
+    #expect(try await spark.range(10).offset(0).count() == 10)
+    #expect(try await spark.range(10).offset(1).count() == 9)
+    #expect(try await spark.range(10).offset(2).count() == 8)
+    #expect(try await spark.range(10).offset(15).count() == 0)
+    await spark.stop()
+  }
+
   @Test
   func sample() async throws {
     let spark = try await SparkSession.builder.getOrCreate()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-connect-swift) branch main updated: [SPARK-51809] Support `offset` in `DataFrame`

Reply via email to