Subject: [Proposal] Introduce built-in Blob implementations (e.g.,
LocalFileBlob, HttpBlob) for common use cases

Hi all,

I've been reviewing the Blob design in PIP-35 ("Introduce Blob to store
multimodal data") and think it's a solid foundation for supporting
multimodal workloads in Paimon.

One area I'd like to propose for improvement is **developer experience and
ease of use**. Currently, users need to implement the `Blob` interface
themselves for custom data sources (e.g., files, HTTP URLs), which leads to
duplicated efforts and potential inconsistencies.

Could we consider introducing **built-in Blob implementations** for common
scenarios? For example:

- `FileBlob`: for reading from local or mounted file systems
- `HttpBlob` / `UrlBlob`: for streaming data from HTTP/HTTPS endpoints
- `ByteArrayBlob`: for small in-memory binary objects (<1MB)

These could be exposed through a simple factory API, such as:

```java
Blob blob = Blobs.fromPath("pangu|oss|file://data/image.png");    // file
Blob blob = Blobs.fromUrl("https://example.com/audio.mp3";);       // remote
URL
Blob blob = Blobs.fromByteArray(embeddingBytes);                     //
inline data
```

Jingsong Li <[email protected]> 于2025年9月17日周三 09:58写道:

> Hi Houliang,
>
> I think it is good to merge https://github.com/apache/paimon/pull/6128 .
>
> I will cc Spark experts to review it.
>
> Best,
> Jingsong
>
> On Wed, Sep 17, 2025 at 9:26 AM Houliang Qi <[email protected]> wrote:
> >
> > Hi Jingsong, I am very happy to see the upcoming release of version 1.3.
> I have a PR and hope to release it in version 1.3.
> > https://github.com/apache/paimon/pull/6128
> >
> >
> >
> > The main hope is to support scala 2.13 compilation in the paimon-spark
> module. Otherwise, if an external module like gravitino depends on
> paimon-spark and is compiled with scale 2.13, but paimon-spark is compiled
> with scale 2.12, it will cause compilation failure.
> >
> >
> > The PR has been completed so far, please have time to review it. Thanks.
> >
> >
> > Best,
> > Houliang
> >
> >
> >
> >
> >
> >
> >
> >
> > ---- Replied Message ----
> > | From | Yunfeng Zhou<[email protected]> |
> > | Date | 9/16/2025 16:14 |
> > | To | <[email protected]> |
> > | Subject | Re: [DISCUSS] Release 1.3.0 |
> > Hi Jingsong,
> >
> > Thanks for driving the release of 1.3.0. I have a pending PR aiming at
> improving Flink Action’s user experience, which I hope could be released
> with paimon 1.3.
> > https://github.com/apache/paimon/pull/6201
> >
> > This is a best-effort work for paimon 1.3 rather than a blocker. And
> since the review has not started yet, I’m not sure how long it will take
> for the PR to complete. We’ll try our best to merge this PR before the
> release triggered.
> >
> > Best regards,
> > Yunfeng
> >
> >
> > 2025年9月15日 21:48,Jingsong Li <[email protected]> 写道:
> >
> > Hi everyone,
> >
> > As the development of Pypaimon has basically completed a stage, we can
> > start releasing version 1.3.0, which will be accompanied by the latest
> > version of Pypaimon, which will be a pure Python implementation
> > independent of JVM.
> >
> > Do you have any other blocks for 1.3.0?
> >
> > Best,
> > Jingsong
> >
>

Reply via email to