Re: [PR] [spark] support lateral inner join for vector search [paimon]

via GitHub Tue, 16 Jun 2026 01:20:51 -0700


JingsongLi commented on code in PR #8252:
URL: https://github.com/apache/paimon/pull/8252#discussion_r3419164239



##########
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/execution/PaimonStrategy.scala:
##########
@@ -215,3 +232,133 @@ case class PaimonStrategy(spark: SparkSession)
     SparkShimLoader.shim.classicApi.recacheByPlan(spark, v2Relation)
   }
 }
+
+case class LateralVectorSearchExec(
+    innerTable: InnerTable,
+    columnName: String,
+    queryVectorExpr: Expression,
+    limit: Int,
+    vectorSearchOutput: Seq[Attribute],
+    projectList: Seq[NamedExpression],
+    projectOutput: Seq[Attribute],
+    child: SparkPlan)
+  extends SparkPlan {
+
+  override def children: Seq[SparkPlan] = Seq(child)
+
+  override def output: Seq[Attribute] = child.output ++ projectOutput
+
+  @transient override lazy val producedAttributes: AttributeSet = 
AttributeSet(vectorSearchOutput)
+
+  @transient
+  override lazy val references: AttributeSet = {
+    AttributeSet.fromAttributeSets(expressions.map(_.references)) -- 
producedAttributes
+  }
+
+  override protected def withNewChildrenInternal(newChildren: 
IndexedSeq[SparkPlan]): SparkPlan = {
+    copy(child = newChildren.head)
+  }
+
+  override protected def doExecute(): RDD[InternalRow] = {
+    child.execute().mapPartitions {
+      outerRows =>

Review Comment:
   Can batch queries be supported? Batch queries are crucial for performance. 
You can take a look to benchmark in 
https://github.com/apache/paimon-vector-index



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [spark] support lateral inner join for vector search [paimon]

Reply via email to