Hi All,

What is the best way to determine partitions of a dataframe dynamically
before writing to disk?

1) statically determine based on data and use coalesce or repartition while
writing
2) somehow determine count of records for entire dataframe and divide that
number to determine partition - however how to determine total count
without having to risk computing dataframe twice (if dataframe is not
cached, and count() is used)

-- 
Regards,

Rishi Shah

Reply via email to