JulianJaffePinterest commented on pull request #10920: URL: https://github.com/apache/druid/pull/10920#issuecomment-993161134
Calling `.partitionBy` on a `DataFrameWriter` (what you get when you call `.write()` on a DataFrame`) doesn't do anything for a v2 data source that doesn't have a managed catalog, which Druid does not (see #11929 for a recent example). The [docs](https://github.com/apache/druid/blob/8392f87236d4a9795aa4e2867eea18cdf0aeb8ec/docs/operations/spark.md#writer) have a more in-depth discussion of partitioning, but the short version is that you'll either need to partition your dataframe before calling `.write()` on it or use one of the `DruidDataFrame` wrapper's convenience methods (for example, ```scala import org.apache.druid.spark.DruidDataFrame df.partitionAndWrite("__time", "millis", "DAY", 200000).format("druid").mode(SaveMode.Overwrite).options(map).save() ``` or in Java ```java import org.apache.druid.spark.package$.MODULE$.DruidDataFrame DruidDataFrame(dataset).partitionAndWrite("__time", "millis", "DAY", 200000).format("druid").mode(SaveMode.Overwrite).options(map).save(); ```) If you don't want to use implicits/wrapper classes, you can also use the partitioner directly: ```java SingleDimensionPartitioner partitioner = new SingleDimensionPartitioner(dataset); Dataset<Row> partitionedDataSet = partitioner.partition("__time", "millis", "DAY", 200000, "dim1", true); partitionedDataset.write().format("druid").mode(SaveMode.Overwrite).options(map).save(); ``` Also, are you setting `writer.version` in your options map? I'm surprised to see the segments differ in version between each partition. That's what's causing the partitions to overshadow each other. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
