Hi Chris,
I’ve recently been working extensively on the C++, C, and Python parts of TsFile, so I’d like to share some context from the implementation side. In the current design, the Python module only exposes high-level APIs, such as basic data types and read interfaces. The core logic remains entirely in the C++ layer, with Cython primarily serving as a binding layer between Python and the native implementation. From a performance perspective, our goal is for Python read performance to be close to that of C++, rather than being limited by pure Python execution. At the moment, I’m also pushing efforts to move commonly used Python-side data structures—such as Arrow and NumPy ndarrays—into C++ for native processing. Providing first-class support for these structures at the lower level can significantly reduce cross-language data copying and conversion overhead, which should further improve overall performance on the Python side. More broadly, I believe that what differentiates TsFile from other formats in the Apache ecosystem is not just the format itself, but the data environment it is designed for. TsFile is much closer to the point where industrial time-series data is generated, and its design naturally targets device- and PLC-side scenarios. At the same time, the downstream analysis it serves tends to align more with high-throughput, machine-learning-oriented workloads. In today’s PLC environments, TsFile is already able to store data naturally and efficiently. Strengthening Python’s analysis capabilities on top of that can help improve the end-to-end integrity of the data pipeline, from data generation to storage and analysis. Best Colin Lee. At 2026-01-04 19:20:38, "Christofer Dutz" <[email protected]> wrote: >Hi all, > >I would also like to give my +1 for a standalone viewer of TSFile payloads. >This year, as soon as I have finished porting the PLC4X drivers to the new >architecture, my next goal will be to build PLC libraries for writing TSFile >data on PLCs directly and then to forward them to a server in regular >intervals via MQTT. So such a viewer component would be invaluable as a tool >to monitor what’s being written and what’s going over the wire. > >Regarging shifting focus more to Python. As long as this doesn’t have negative >impact on the Java versions, I’m fine with that. But considering the >performance benchmarks shared on this list I am not sure if using TSFile >directly in Python is a good thing. I mean … its performance was always a >tenth of that of Java, C++ etc. Making it super convenient to directly use in >the Python toolchains, wouldn’t that make us use one of the key benefits of >TSFile … it’s performance? > >Chris > > >Von: Pengcheng Zheng <[email protected]> >Datum: Mittwoch, 31. Dezember 2025 um 15:06 >An: [email protected] <[email protected]> >Betreff: Re: Future Directions of Apache TsFile > >Hi all, > >Great discussion :) I’d like to add a bit of context based on some >observations and discussions we’ve seen across industrial use cases, >academic perspectives, and recent community feedback around TsFile. > >One direction that has emerged from these discussions is to view TsFile not >only as an efficient time-series file format, but also as a long-term >carrier for high-quality industrial time-series datasets, especially in >AI-related workflows. > >In many industrial scenarios, the key challenge is no longer data >collection, but how time-series data can be preserved and reused across >different tools, languages, and modeling pipelines over a long lifecycle. >From this perspective, clear time semantics, metadata, and efficient I/O >matter as much as raw read/write performance. > >That’s why ideas like closer Python/DataFrame alignment, Hugging Face >integration, format converters, and lightweight viewers are interesting to >us. At the same time, we believe this should evolve incrementally and be >driven by concrete use cases. > >Happy to continue the discussion. > > >Thanks, >Pengcheng > > >Am Mi., 31. Dez. 2025 um 16:32 Uhr schrieb Jialin Qiao < >[email protected]>: > >> Hi, >> >> I create a issue on hugging face dataset project [1], we could also >> discuss here. >> >> [1] https://github.com/huggingface/datasets/issues/7922 >> >> Jialin Qiao >> >> Tian Jiang <[email protected]> 于2025年12月31日周三 16:12写道: >> > >> > It is exciting to hear the new directions, most of which focus on the >> integration with the AI eco-systems. >> > >> > >> > I personally do not participate in many AI-related works. Nevertheless, >> I feel it interesting to apply TsFile in as many areas as possible. >> > >> > >> > If some detailed user cases can be provided, I am more than happy to >> join the brainstorm of evolving TsFile to the next generation. >> > >> > >> > Best, >> > Tian Jiang >> > >> > >> > ---- Replied Message ---- >> > | From | Caiyin Yang<[email protected]> | >> > | Date | 12/31/2025 15:48 | >> > | To | <[email protected]> | >> > | Subject | Re: Future Directions of Apache TsFile | >> > Hi Jialin, >> > >> > I strongly support the integration with Hugging Face Datasets. >> > >> > The primary bottleneck in Time-Series AI today is not a lack of data, >> but the lack of standardized, high-performance Data IO. Native integration >> would transform TsFile into a foundational infrastructure for the TS >> community, rather than just another file format. >> > >> > From our experience developing the Sundial model, such a bridge would >> make sharing datasets like TimeBench seamless. More importantly, it unlocks >> massive industrial IoT data from IoTDB directly into AI training pipelines. >> > >> > Let's make TsFile the "first-class citizen" for Time-Series in the AI >> ecosystem. I'm eager to help define the technical requirements! >> > >> > Best, Caiyin Yang >> > >> > On 2025/12/30 12:37:20 Jialin Qiao wrote: >> > Hi all, >> > >> > With the release of TsFile 2.2.0, the project now offers >> > multi-language SDKs (Python, Java, C++, C), enabling seamless data >> > storage for terminal devices, real-time edge-side processing, and >> > cloud-based data analysis. Its support for table models further >> > simplifies data analysis and model training in Python. >> > >> > As AI continues to gain momentum, TsFile can serve as a foundational >> > format for building industrial time-series datasets in the AI era. >> > >> > Here are some potential work we could do >> > 1. Deeper alignment with the Python ecosystem, such as Pandas & >> DataFrame. >> > 2. Integration with HuggingFace Datasets. >> > 3. Viewer of a TsFile. >> > 4. Converter between other formats(such as Parquet, CSV, HDF5) and >> TsFile. >> > >> > Welcome further ideas to advance the TsFile community :-) >> > >> > Thanks, >> > Jialin Qiao >> > >>
