JackieTien97 opened a new issue, #843:
URL: https://github.com/apache/tsfile/issues/843

   ## Motivation
   
   Apache TsFile already has ecosystem integration with Spark, and the 
tree-model Spark TsFile connector can be used as a reference:
   https://github.com/apache/iotdb-extras/tree/master/connectors/spark-tsfile
   
   TsFile now also supports the table model, including `TableSchema`, 
`ColumnCategory.TAG/FIELD`, table-model read APIs, and table-model write APIs 
such as `TsFileWriter#registerTableSchema` and `TsFileWriter#writeTable`.
   
   We need a Spark connector for table-model TsFile so users can read and write 
table-model TsFiles directly through Spark SQL/DataFrame APIs.
   
   ## Goal
   
   Develop a Spark SQL/DataFrame connector for TsFile table model. The 
connector should reuse existing TsFile Java read/write APIs as much as 
possible, instead of duplicating TsFile parsing or writing logic.
   
   ## Expected Scope
   
   The initial implementation should support:
   
   - Reading table-model TsFile files or directories into Spark DataFrames.
   - Inferring or loading table schemas from TsFile metadata, including:
     - table name
     - time column
     - TAG columns
     - FIELD columns
     - TsFile data types and corresponding Spark SQL types
   - Preserving table-model semantics:
     - TAG columns identify devices
     - FIELD columns represent measurements
     - null values and sparse field values are handled correctly
   - Reading multiple TsFiles with compatible schemas.
   - Column pruning where possible.
   - Predicate pushdown where possible, especially:
     - time-range filters
     - tag filters
   - Writing Spark DataFrames into table-model TsFiles, with options such as:
     - table name
     - tag columns
     - field columns
     - encoding/compression defaults if needed
   - Providing user-facing examples for Spark SQL/DataFrame read and write 
workflows.
   
   ## Proposed User Experience
   
   Example read API:
   
   ```scala
   val df = spark.read
     .format("tsfile")
     .option("model", "table")
     .option("table", "weather")
     .load("/path/to/tsfile-dir")
   
   df.select("time", "city", "device", "temperature")
     .where("city = 'beijing'")
     .show()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to