JackieTien97 opened a new issue, #843: URL: https://github.com/apache/tsfile/issues/843
## Motivation Apache TsFile already has ecosystem integration with Spark, and the tree-model Spark TsFile connector can be used as a reference: https://github.com/apache/iotdb-extras/tree/master/connectors/spark-tsfile TsFile now also supports the table model, including `TableSchema`, `ColumnCategory.TAG/FIELD`, table-model read APIs, and table-model write APIs such as `TsFileWriter#registerTableSchema` and `TsFileWriter#writeTable`. We need a Spark connector for table-model TsFile so users can read and write table-model TsFiles directly through Spark SQL/DataFrame APIs. ## Goal Develop a Spark SQL/DataFrame connector for TsFile table model. The connector should reuse existing TsFile Java read/write APIs as much as possible, instead of duplicating TsFile parsing or writing logic. ## Expected Scope The initial implementation should support: - Reading table-model TsFile files or directories into Spark DataFrames. - Inferring or loading table schemas from TsFile metadata, including: - table name - time column - TAG columns - FIELD columns - TsFile data types and corresponding Spark SQL types - Preserving table-model semantics: - TAG columns identify devices - FIELD columns represent measurements - null values and sparse field values are handled correctly - Reading multiple TsFiles with compatible schemas. - Column pruning where possible. - Predicate pushdown where possible, especially: - time-range filters - tag filters - Writing Spark DataFrames into table-model TsFiles, with options such as: - table name - tag columns - field columns - encoding/compression defaults if needed - Providing user-facing examples for Spark SQL/DataFrame read and write workflows. ## Proposed User Experience Example read API: ```scala val df = spark.read .format("tsfile") .option("model", "table") .option("table", "weather") .load("/path/to/tsfile-dir") df.select("time", "city", "device", "temperature") .where("city = 'beijing'") .show() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
