Hi, I sometimes write convenience methods for pre-processing data frames, and
I wonder if it makes sense to make a contribution -- should this be included
in Spark or supplied as Spark Packages/3rd party libraries?


Get all fields in a DataFrame schema of a certain type.

I end up writing something like getFieldsByDataType(dataFrame: DataFrame,
dataType: DataType): List[StructField] and may be adding that to the Schema
class with implicits. Something like:

dataFrame.schema.fields.filter(_.dataType == dataType)

Should the fields variable in the Schema class contain a method like
"filterByDataType" so we can write:


Is it useful? Is it too bloated? Would that be acceptable? That is a small
contribution that a junior developer might be able to write, for example.
This adds more code, but may be makes the library more user friendly (not
that it is not user friendly).

Just want to hear your thoughts on this question.


View this message in context: 
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to