Hi Hongyin,

Thanks for proposing this and for putting in the effort to build it.

I think a unified CLI for TsFile can significantly improve the overall 
usability and accessibility of the ecosystem. Beyond traditional debugging and 
inspection workflows, I can also see it becoming a valuable building block for 
data exploration, automation pipelines, and emerging AI-assisted workflows, 
where quick schema discovery, metadata inspection, and sampling are often the 
first step before deeper analysis.

The proposed command set looks practical and aligns well with common data 
tooling experiences. Having a dedicated C++ implementation is also great, 
especially since it is built on top of the existing APIs and can serve as a 
stable foundation for users across environments.

One additional thought is around the Python ecosystem. While the CLI covers 
many common inspection and import scenarios, users working with TsFileDataFrame 
 workflows may eventually benefit from a more integrated Python experience. It 
could be interesting in the future to explore whether some of the capabilities 
provided by this CLI can be reused or exposed through higher-level Python 
tooling, allowing users to move seamlessly between command-line exploration and 
dataframe-based analysis.

Overall, I think this is a valuable addition to the project and a good step 
toward making TsFile more approachable and easier to explore.

Thanks again for driving this work.

Best,
Shuolin


发件人: [email protected] 
<[email protected]> 代表 张洪胤 <[email protected]>
日期: 星期四, 2026年6月4日 12:35
收件人: [email protected] <[email protected]>
主题: [DISCUSS] tsfile-cli: a command-line tool to inspect and write TsFile

Hi all,


  About me
  --------
  I'm Zhang Hongyin, Apache IoTDB PMC. I've been working with the
  TsFile format and wanted an easier way to inspect .tsfile files from the
  command line, which led to the proposal below. (This is my first contribution
  to TsFile -- happy to adjust anything to match the project's conventions.)


  Motivation
  ----------
  TsFile can be inspected programmatically and with the existing print/sketch
  utilities, but there is no single, pipe-friendly command that lets users
  explore a .tsfile from the shell the way parquet-cli / pqrs do for Parquet.
  I put together "tsfile-cli", a single C++ binary (under cpp/tools/) that
  provides the common read-only verbs plus a simple CSV/TSV import. It is built
  entirely on the existing public storage::TsFileReader and
  storage::TsFileTableWriter APIs and does not modify the storage engine.


  What it does
  ------------
  Read / inspect (data goes to stdout, diagnostics to stderr, so it composes
  with awk/jq/sort):
    ls list devices (tree model) or tables (table model)
    schema per-series datatype / encoding / compression
    meta file-level summary (model, counts, time range, size)
    stats per-series count / min / max / first / last / sum (from statistics)
    head,cat preview / stream rows, with column projection and time-range 
filters
    sample deterministic reservoir sample
  Output formats: csv | tsv | json | table (TTY-adaptive). Exit codes 0/1/2/3.


  Write / import:
    write import CSV/TSV rows into a new table-model .tsfile, using an explicit
             --columns "name:TYPE:tag|field" schema (no type inference), with
             stdin support and silent-on-success (Unix style).

  Scope and non-goals (first iteration)
  -------------------------------------
    - Read commands cover both tree and table models.
    - "write" targets the table model with CSV/TSV input only. Tree-model 
writes,
      JSON input, type inference, and tsfile->tsfile transforms (convert / 
merge /
      rewrite) are deliberately left as follow-ups.
    - Includes unit and in-process end-to-end tests (argument parsing, 
formatters,
      statistics, CSV/TSV parsing, and a write->read round-trip), plus a README
      under cpp/tools/.

  PR: https://github.com/apache/tsfile/pull/829

  Feedback I'd especially appreciate:
    1. Whether a unified "tsfile-cli <command>" dispatcher is a direction the
       project wants.
    2. The verb surface and option naming -- anything missing or non-idiomatic?
    3. The write command's scope and the explicit --columns schema approach.

  Thanks for taking a look!

  Best regards,
  Zhang Hongyin

Reply via email to