Hi Qiao,
So nice to see that.
Recently, I have noticed that some data analysts, after using a specific
file for data storage, hope to directly query this file using SQL syntax.
Taking DuckDB as an example, for Parquet and CSV files, they can directly
execute the following SQL query in DuckDB without the need to import the
file into DuckDB first[1]. In fact, they are likely performing some simple
pre-import analysis on the file to determine whether it should be
ultimately imported into the database. However, before they finally confirm
the import, they do not want to import the file, as this would risk
contaminating the data in the database.
```sql
SELECT * FROM read_parquet('input.parquet');
```
Therefore, I am wondering if we can provide similar functionality in IoTDB.
Based on my research, the read_parquet function is a dynamic table-valued
function (TVF) in DuckDB. Since IoTDB's table model also supports dynamic
table-valued functions, implementing this feature should be relatively
straightforward.
[1] https://duckdb.org/docs/stable/guides/file_formats/query_parquet
On Tue, Dec 30, 2025 at 8:37 PM Jialin Qiao <[email protected]> wrote:
> Hi all,
>
> With the release of TsFile 2.2.0, the project now offers
> multi-language SDKs (Python, Java, C++, C), enabling seamless data
> storage for terminal devices, real-time edge-side processing, and
> cloud-based data analysis. Its support for table models further
> simplifies data analysis and model training in Python.
>
> As AI continues to gain momentum, TsFile can serve as a foundational
> format for building industrial time-series datasets in the AI era.
>
> Here are some potential work we could do
> 1. Deeper alignment with the Python ecosystem, such as Pandas & DataFrame.
> 2. Integration with HuggingFace Datasets.
> 3. Viewer of a TsFile.
> 4. Converter between other formats(such as Parquet, CSV, HDF5) and TsFile.
>
> Welcome further ideas to advance the TsFile community :-)
>
> Thanks,
> Jialin Qiao
>