To filter data, how about using sql? df.createOrReplaceTempView("df") val sqlDF = spark.sql("SELECT * FROM df WHERE EMOTION IN (HAPPY,SAD,ANGRY,NEUTRAL,NA)")
https://spark.apache.org/docs/latest/sql-programming-guide.html#sql On Fri, Jun 16, 2017 at 11:28 PM, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi Saatvik > > You can write your own transformer to make sure that column contains > ,value which u provided , and filter out rows which doesn't follow the > same. > > Something like this > > > case class CategoryTransformer(override val uid : String) extends > Transformer{ > override def transform(inputData: DataFrame): DataFrame = { > inputData.select("col1").filter("col1 in ('happy')") > } > override def copy(extra: ParamMap): Transformer = ??? > @DeveloperApi > override def transformSchema(schema: StructType): StructType ={ > schema > } > } > > > Usage > > val data = sc.parallelize(List("abce","happy")).toDF("col1") > val trans = new CategoryTransformer("1") > data.show() > trans.transform(data).show() > > > This transformer will make sure , you always have values in col1 as > provided by you. > > > Regards > Pralabh Kumar > > On Fri, Jun 16, 2017 at 8:10 PM, Saatvik Shah <saatvikshah1...@gmail.com> > wrote: > >> Hi Pralabh, >> >> I want the ability to create a column such that its values be restricted >> to a specific set of predefined values. >> For example, suppose I have a column called EMOTION: I want to ensure >> each row value is one of HAPPY,SAD,ANGRY,NEUTRAL,NA. >> >> Thanks and Regards, >> Saatvik Shah >> >> >> On Fri, Jun 16, 2017 at 10:30 AM, Pralabh Kumar <pralabhku...@gmail.com> >> wrote: >> >>> Hi satvik >>> >>> Can u please provide an example of what exactly you want. >>> >>> >>> >>> On 16-Jun-2017 7:40 PM, "Saatvik Shah" <saatvikshah1...@gmail.com> >>> wrote: >>> >>>> Hi Yan, >>>> >>>> Basically the reason I was looking for the categorical datatype is as >>>> given here >>>> <https://pandas.pydata.org/pandas-docs/stable/categorical.html>: >>>> ability to fix column values to specific categories. Is it possible to >>>> create a user defined data type which could do so? >>>> >>>> Thanks and Regards, >>>> Saatvik Shah >>>> >>>> On Fri, Jun 16, 2017 at 1:42 AM, 颜发才(Yan Facai) <facai....@gmail.com> >>>> wrote: >>>> >>>>> You can use some Transformers to handle categorical data, >>>>> For example, >>>>> StringIndexer encodes a string column of labels to a column of label >>>>> indices: >>>>> http://spark.apache.org/docs/latest/ml-features.html#stringindexer >>>>> >>>>> >>>>> On Thu, Jun 15, 2017 at 10:19 PM, saatvikshah1994 < >>>>> saatvikshah1...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> I'm trying to convert a Pandas -> Spark dataframe. One of the columns >>>>>> I have >>>>>> is of the Category type in Pandas. But there does not seem to be >>>>>> support for >>>>>> this same type in Spark. What is the best alternative? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: http://apache-spark-user-list. >>>>>> 1001560.n3.nabble.com/Best-alternative-for-Category-Type-in- >>>>>> Spark-Dataframe-tp28764.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Saatvik Shah,* >>>> *1st Year,* >>>> *Masters in the School of Computer Science,* >>>> *Carnegie Mellon University* >>>> >>>> *https://saatvikshah1994.github.io/ >>>> <https://saatvikshah1994.github.io/>* >>>> >>> >> >> >> -- >> *Saatvik Shah,* >> *1st Year,* >> *Masters in the School of Computer Science,* >> *Carnegie Mellon University* >> >> *https://saatvikshah1994.github.io/ <https://saatvikshah1994.github.io/>* >> > >