Hi,
Seeing such a heated discussion, I’d love to jump in and share some of my
recent takeaways. After spending the last seven months diving deep into the
Python ecosystem, I’ve developed a strong intuition about the future of TsFile:
Blending into the existing ecosystem is far more critical than just polishing
technical specs.
Python is now the de facto standard for AI and data analysis. People often
complain about Python being "slow," but that’s never the real deal-breaker. As
long as a tool creates genuine value, the community will always find a way to
optimize the "hot" paths—whether that’s a rewrite in C++/Rust or leveraging
SIMD/CUDA and distributed computing. Performance can be engineered later, but
user adoption can’t be forced.
The truth is, most python developers are "lazy". They don’t want to dig into
proprietary concepts like TsFileReader or ColumnSchema in our doc[1]; they
don't even want to import a new library named tsfile if they don't have to. The
dream experience for a new user is to start using TsFile without actually
having to "learn" it.
Ideally, they should be able to interact with TsFile just like they do with
Pandas:
```
import pandas as pd
df = pd.read_tsfile('your_file.tsfile')
df.to_tsfile('your_file.tsfile')
```
I believe making TsFile a "first-class citizen" in the Pandas ecosystem is our
highest-leverage move.
If we can bridge these interfaces and run "blind" performance benchmarks
against Parquet using mainstream datasets—and keep optimizing until we show a
clear edge—we won't need to push TsFile. The Python community will follow the
scent of a better tool and come to us naturally.
[1]
https://tsfile.apache.org/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.html#%E8%AF%BB%E5%8F%96%E7%A4%BA%E4%BE%8B
Best
------------
Xinyu Tan
On 2025/12/30 12:37:20 Jialin Qiao wrote:
> Hi all,
>
> With the release of TsFile 2.2.0, the project now offers
> multi-language SDKs (Python, Java, C++, C), enabling seamless data
> storage for terminal devices, real-time edge-side processing, and
> cloud-based data analysis. Its support for table models further
> simplifies data analysis and model training in Python.
>
> As AI continues to gain momentum, TsFile can serve as a foundational
> format for building industrial time-series datasets in the AI era.
>
> Here are some potential work we could do
> 1. Deeper alignment with the Python ecosystem, such as Pandas & DataFrame.
> 2. Integration with HuggingFace Datasets.
> 3. Viewer of a TsFile.
> 4. Converter between other formats(such as Parquet, CSV, HDF5) and TsFile.
>
> Welcome further ideas to advance the TsFile community :-)
>
> Thanks,
> Jialin Qiao
>