Hi Qiao,

Thank you for sharing the exciting news about the TsFile 2.2.0 release and the 
potential directions for community collaboration. I am particularly interested 
in the "Viewer of a TsFile"​ initiative and believe it could significantly 
enhance the usability and accessibility of TsFile for developers and data 
analysts.

A dedicated TsFile Viewer would allow users to intuitively inspect the internal 
structure and data content of TsFile without writing code—for example, browsing 
devices, measurements, time-series data points, and metadata (e.g., schema, 
encoding, compression details). This is especially valuable for data 
validation, debugging, and exploratory analysis.

Furthermore, i propose integrating format conversion capabilities (e.g., 
to/from Parquet, CSV)​ directly into the Viewer in the future. This would 
create a seamless workflow: users could open a TsFile, inspect its content, and 
then convert it to other common formats for further analysis in tools like 
Pandas or Spark—or vice versa—without switching between multiple tools. This 
aligns well with TsFile's goal of simplifying data operations across terminal, 
edge, and cloud environments.

Best regards,
Xuan Wang

发件人: Yuan Tian <[email protected]>
日期: 星期三, 2025年12月31日 09:53
收件人: [email protected] <[email protected]>
主题: Re: Future Directions of Apache TsFile

Hi Qiao,

So nice to see that.

Recently, I have noticed that some data analysts, after using a specific
file for data storage, hope to directly query this file using SQL syntax.
Taking DuckDB as an example, for Parquet and CSV files, they can directly
execute the following SQL query in DuckDB without the need to import the
file into DuckDB first[1]. In fact, they are likely performing some simple
pre-import analysis on the file to determine whether it should be
ultimately imported into the database. However, before they finally confirm
the import, they do not want to import the file, as this would risk
contaminating the data in the database.

```sql
SELECT * FROM read_parquet('input.parquet');
```
Therefore, I am wondering if we can provide similar functionality in IoTDB.
Based on my research, the read_parquet function is a dynamic table-valued
function (TVF) in DuckDB. Since IoTDB's table model also supports dynamic
table-valued functions, implementing this feature should be relatively
straightforward.


[1] 
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fduckdb.org%2Fdocs%2Fstable%2Fguides%2Ffile_formats%2Fquery_parquet&data=05%7C02%7C%7C7b61ffbaccd54dcf6e2908de480f5193%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639027427800189013%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=xJY6o1IS0OBOI0pcfgTQXUPp%2BEdR%2FALq0z8H%2F0o6gG4%3D&reserved=0<https://duckdb.org/docs/stable/guides/file_formats/query_parquet>

On Tue, Dec 30, 2025 at 8:37 PM Jialin Qiao <[email protected]> wrote:

> Hi all,
>
> With the release of TsFile 2.2.0, the project now offers
> multi-language SDKs (Python, Java, C++, C), enabling seamless data
> storage for terminal devices, real-time edge-side processing, and
> cloud-based data analysis. Its support for table models further
> simplifies data analysis and model training in Python.
>
> As AI continues to gain momentum, TsFile can serve as a foundational
> format for building industrial time-series datasets in the AI era.
>
> Here are some potential work we could do
> 1. Deeper alignment with the Python ecosystem, such as Pandas & DataFrame.
> 2. Integration with HuggingFace Datasets.
> 3. Viewer of a TsFile.
> 4. Converter between other formats(such as Parquet, CSV, HDF5) and TsFile.
>
> Welcome further ideas to advance the TsFile community :-)
>
> Thanks,
> Jialin Qiao
>

Reply via email to