Hi there ! I have a potentially large dataset ( regarding number of rows and cols )
And I want to find the fastest way to drop some useless cols for me, i.e. cols containing only an unique value !
I want to know what do you think that I could do to do this as fast as possible using spark.
I already have a solution using distinct().count() or approxCountDistinct() But, they may not be the best choice as this requires to go through all the data, even if the 2 first tested values for a col are already different ( and in this case I know that I can keep the col )
Thx for your ideas ! Julien --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org