Copilot commented on code in PR #17449: URL: https://github.com/apache/pinot/pull/17449#discussion_r2664622886
########## AGENTS.md: ########## @@ -0,0 +1,108 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Build JDK: Use JDK 11+ (CI runs 11/21); code targets Java 11. +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test example: `./mvnw -pl pinot-segment-local -Dtest=RangeIndexTest test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test example: `./mvnw -pl pinot-integration-tests -Dtest=OfflineClusterIntegrationTest test -am -Dsurefire.failIfNoSpecifiedTests=false` + +## Coding conventions and hygiene +- Add class-level Javadoc for new classes; describe behavior and thread-safety. +- Prefer `///` Javadoc syntax when available (JDK 23+, JEP-467); `/** ... */` is also accepted. +- Keep license headers on all new source files. +- Use `./mvnw license:format` to add headers to new files. +- Preserve backward compatibility across mixed-version broker/server/controller. +- Prefer targeted unit tests; use integration tests when behavior crosses roles. + +## Checkstyle config +- Checkstyle rules and related config files live under `config/`. +- Use `./mvnw spotless:apply` to format code and `./mvnw checkstyle:check` to validate style. Review Comment: The Maven wrapper command `./mvnw` should be `mvnw.cmd` on Windows systems. Consider adding a note that Windows users should use `mvnw.cmd` instead of `./mvnw`, or use platform-agnostic language like 'Use Maven wrapper (`mvnw`) to format code'. ```suggestion - Use the Maven wrapper (`./mvnw` on Unix-like systems or `mvnw.cmd` on Windows) to run `spotless:apply` to format code and `checkstyle:check` to validate style. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
