AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()`
method, which is ordered by the data size, to get the partition
preferred locations. If there are other vectors to sort, I'm wondering
if here[1] can be a place to add. Or inheriting class `FilePartition`
with overridden `preferredLocations()` might also work.

-- 
Cheers,
-z
[1] 
https://github.com/apache/spark/blob/a4195d28ae94793b793641f121e21982bf3880d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala#L43

On Thu, 4 Jun 2020 06:40:43 +0000
Nasrulla Khan Haris <nasrulla.k...@microsoft.com.INVALID> wrote:

> HI Spark developers,
> 
> I have created new format extending fileformat. I see getPrefferedLocations 
> is available if newCustomRDD is created. Since fileformat is based off 
> FileScanRDD which uses readfile method to read partitioned file, Is there a 
> way to add desired preferredLocations ?
> 
> Appreciate your responses.
> 
> Thanks,
> NKH
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to