AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()` method, which is ordered by the data size, to get the partition preferred locations. If there are other vectors to sort, I'm wondering if here[1] can be a place to add. Or inheriting class `FilePartition` with overridden `preferredLocations()` might also work.
-- Cheers, -z [1] https://github.com/apache/spark/blob/a4195d28ae94793b793641f121e21982bf3880d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala#L43 On Thu, 4 Jun 2020 06:40:43 +0000 Nasrulla Khan Haris <nasrulla.k...@microsoft.com.INVALID> wrote: > HI Spark developers, > > I have created new format extending fileformat. I see getPrefferedLocations > is available if newCustomRDD is created. Since fileformat is based off > FileScanRDD which uses readfile method to read partitioned file, Is there a > way to add desired preferredLocations ? > > Appreciate your responses. > > Thanks, > NKH > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org