buraksenn commented on code in PR #21398:
URL: https://github.com/apache/datafusion/pull/21398#discussion_r3051631661


##########
docs/source/library-user-guide/custom-table-providers.md:
##########
@@ -19,568 +19,939 @@
 
 # Custom Table Provider
 
-Like other areas of DataFusion, you extend DataFusion's functionality by 
implementing a trait. The [`TableProvider`] and associated traits allow you to 
implement a custom table provider, i.e. use DataFusion's other functionality 
with your custom data source.
-
-This section describes how to create a [`TableProvider`] and how to configure 
DataFusion to use it for reading.
+One of DataFusion's greatest strengths is its extensibility. If your data lives
+in a custom format, behind an API, or in a system that DataFusion does not
+natively support, you can teach DataFusion to read it by implementing a
+**custom table provider**. This post walks through the three layers you need to
+understand to design a table provider and where planning and execution work 
should happen.
 
 For details on how table constraints such as primary keys or unique
 constraints are handled, see [Table Constraint 
Enforcement](table-constraints.md).
 
-## Table Provider and Scan
-
-The [`TableProvider::scan`] method reads data from the table and is likely the 
most important. It returns an [`ExecutionPlan`] that DataFusion will use to 
read the actual data during execution of the query. The 
[`TableProvider::insert_into`] method is used to `INSERT` data into the table.
-
-### Scan
-
-As mentioned, [`TableProvider::scan`] returns an execution plan, and in 
particular a `Result<Arc<dyn ExecutionPlan>>`. The core of this is returning 
something that can be dynamically dispatched to an `ExecutionPlan`. And as per 
the general DataFusion idea, we'll need to implement it.
-
-[`tableprovider`]: 
https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html
-[`tableprovider::scan`]: 
https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#tymethod.scan
-[`tableprovider::insert_into`]: 
https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#tymethod.insert_into
-[`executionplan`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
-
-#### Execution Plan
-
-The `ExecutionPlan` trait at its core is a way to get a stream of batches. The 
aptly-named `execute` method returns a `Result<SendableRecordBatchStream>`, 
which should be a stream of `RecordBatch`es that can be sent across threads, 
and has a schema that matches the data to be contained in those batches.
-
-There are many different types of `SendableRecordBatchStream` implemented in 
DataFusion -- you can use a pre existing one, such as `MemoryStream` (if your 
`RecordBatch`es are all in memory) or implement your own custom logic, 
depending on your usecase.
-
-Looking at the full example below:
-
-```rust
-use std::any::Any;
-use std::sync::{Arc, Mutex};
-use std::collections::{BTreeMap, HashMap};
-use datafusion::common::Result;
-use datafusion::common::tree_node::TreeNodeRecursion;
-use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
-use datafusion::physical_plan::expressions::PhysicalSortExpr;
-use datafusion::physical_plan::{
-    ExecutionPlan, SendableRecordBatchStream, DisplayAs, DisplayFormatType,
-    Statistics, PlanProperties, PhysicalExpr
-};
-use datafusion::execution::context::TaskContext;
-use datafusion::arrow::array::{UInt64Builder, UInt8Builder};
-use datafusion::physical_plan::memory::MemoryStream;
-use datafusion::arrow::record_batch::RecordBatch;
-
-/// A User, with an id and a bank account
-#[derive(Clone, Debug)]
-struct User {
-    id: u8,
-    bank_account: u64,
-}
-
-/// A custom datasource, used to represent a datastore with a single index
-#[derive(Clone, Debug)]
-pub struct CustomDataSource {
-    inner: Arc<Mutex<CustomDataSourceInner>>,
-}
-
-#[derive(Debug)]
-struct CustomDataSourceInner {
-    data: HashMap<u8, User>,
-    bank_account_index: BTreeMap<u64, u8>,
-}
+This content is based on the blog post
+[Writing Custom Table Providers in Apache 
DataFusion](https://datafusion.apache.org/blog/2026/03/31/writing-table-providers/)
+by [Tim Saucer](https://github.com/timsaucer).

Review Comment:
   After code 
suggestion(https://github.com/apache/datafusion/pull/21398#discussion_r3051293600)
 is applied this was fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to