Hey all, Are there any major caveats or gotchas I should be aware of when implementing a new ShardSpec? The context here is that we have a datasource that is the combined result of multiple input jobs. We're trying to do write-side joining by having all of the jobs write segments for the same intervals (e.g. partitioning on both partition number and source pipeline). For now, I've modified the Spark-Druid batch ingestor ( https://github.com/metamx/druid-spark-batch) to run in our various pipelines and to write out segments with identifier form `dataSource_startInterval_endInterval_version_sourceName_partitionNum. This is working without issue for loading, querying, and deleting data, but the metadata API reports the incorrect segment identifier, since it reconstructs the identifier instead of reading from metadata (e.g. it reports segment identifiers of the form `dataSource_startInterval_endInterval_version_partitionNum`). Both because we'd like this to be fully supported, and because we imagine that this feature may be useful to others, I'd like to implement this via a ShardSpec.
Julian