Fastest way to drop useless columns

julio . cesare Thu, 31 May 2018 01:35:13 -0700

Hi there !

I have a potentially large dataset ( regarding number of rows and cols )

And I want to find the fastest way to drop some useless cols for me,i.e. cols containing only an unique value !

I want to know what do you think that I could do to do this as fast aspossible using spark.

I already have a solution using distinct().count() orapproxCountDistinct()But, they may not be the best choice as this requires to go through allthe data, even if the 2 first tested values for a col are alreadydifferent ( and in this case I know that I can keep the col )



Thx for your ideas !

Julien

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Fastest way to drop useless columns

Reply via email to