Hi all,

  About me
  --------
  I'm Zhang Hongyin, Apache IoTDB PMC. I've been working with the
  TsFile format and wanted an easier way to inspect .tsfile files from the
  command line, which led to the proposal below. (This is my first contribution
  to TsFile -- happy to adjust anything to match the project's conventions.)


  Motivation
  ----------
  TsFile can be inspected programmatically and with the existing print/sketch
  utilities, but there is no single, pipe-friendly command that lets users
  explore a .tsfile from the shell the way parquet-cli / pqrs do for Parquet.
  I put together "tsfile-cli", a single C++ binary (under cpp/tools/) that
  provides the common read-only verbs plus a simple CSV/TSV import. It is built
  entirely on the existing public storage::TsFileReader and
  storage::TsFileTableWriter APIs and does not modify the storage engine.


  What it does
  ------------
  Read / inspect (data goes to stdout, diagnostics to stderr, so it composes
  with awk/jq/sort):
    ls list devices (tree model) or tables (table model)
    schema per-series datatype / encoding / compression
    meta file-level summary (model, counts, time range, size)
    stats per-series count / min / max / first / last / sum (from statistics)
    head,cat preview / stream rows, with column projection and time-range 
filters
    sample deterministic reservoir sample
  Output formats: csv | tsv | json | table (TTY-adaptive). Exit codes 0/1/2/3.


  Write / import:
    write import CSV/TSV rows into a new table-model .tsfile, using an explicit
             --columns "name:TYPE:tag|field" schema (no type inference), with
             stdin support and silent-on-success (Unix style).
             
  Scope and non-goals (first iteration)
  -------------------------------------
    - Read commands cover both tree and table models.
    - "write" targets the table model with CSV/TSV input only. Tree-model 
writes,
      JSON input, type inference, and tsfile->tsfile transforms (convert / 
merge /
      rewrite) are deliberately left as follow-ups.
    - Includes unit and in-process end-to-end tests (argument parsing, 
formatters,
      statistics, CSV/TSV parsing, and a write->read round-trip), plus a README
      under cpp/tools/.
      
  PR: https://github.com/apache/tsfile/pull/829
  
  Feedback I'd especially appreciate:
    1. Whether a unified "tsfile-cli <command>" dispatcher is a direction the
       project wants.
    2. The verb surface and option naming -- anything missing or non-idiomatic?
    3. The write command's scope and the explicit --columns schema approach.
    
  Thanks for taking a look!
  
  Best regards,
  Zhang Hongyin

Reply via email to