Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Saatvik Shah
Thanks guys, You'll have given a number of options to work with. The thing is that Im working in a production environment where it might be necessary to ensure that no one erroneously inserts new records in those specific columns which should be the Category data type. The best alternative there

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Pralabh Kumar
make sense :) On Sun, Jun 18, 2017 at 8:38 AM, 颜发才(Yan Facai) wrote: > Yes, perhaps we could use SQLTransformer as well. > > http://spark.apache.org/docs/latest/ml-features.html#sqltransformer > > On Sun, Jun 18, 2017 at 10:47 AM, Pralabh Kumar >

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Yan Facai
Yes, perhaps we could use SQLTransformer as well. http://spark.apache.org/docs/latest/ml-features.html#sqltransformer On Sun, Jun 18, 2017 at 10:47 AM, Pralabh Kumar wrote: > Hi Yan > > Yes sql is good option , but if we have to create ML Pipeline , then > having

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Pralabh Kumar
Hi Yan Yes sql is good option , but if we have to create ML Pipeline , then having transformers and set it into pipeline stages ,would be better option . Regards Pralabh Kumar On Sun, Jun 18, 2017 at 4:23 AM, 颜发才(Yan Facai) wrote: > To filter data, how about using sql? >

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-17 Thread Kanagha Kumar
Hi, Bumping up again! Why does spark modules depend upon scala2.11 versions inspite of changing pom.xmls using ./dev/change-scala-version.sh 2.10. Appreciate any quick help!! Thanks On Fri, Jun 16, 2017 at 2:59 PM, Kanagha Kumar wrote: > Hey all, > > > I'm trying to

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Yan Facai
To filter data, how about using sql? df.createOrReplaceTempView("df") val sqlDF = spark.sql("SELECT * FROM df WHERE EMOTION IN (HAPPY,SAD,ANGRY,NEUTRAL,NA)") https://spark.apache.org/docs/latest/sql-programming-guide.html#sql On Fri, Jun 16, 2017 at 11:28 PM, Pralabh Kumar

Re: Spark-Kafka integration - build failing with sbt

2017-06-17 Thread karan alang
Thanks, Cody .. yes, was able to fix that. On Sat, Jun 17, 2017 at 1:18 PM, Cody Koeninger wrote: > There are different projects for different versions of kafka, > spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 > > See > >

Re: Spark-Kafka integration - build failing with sbt

2017-06-17 Thread Cody Koeninger
There are different projects for different versions of kafka, spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 See http://spark.apache.org/docs/latest/streaming-kafka-integration.html On Fri, Jun 16, 2017 at 6:51 PM, karan alang wrote: > I'm trying to compile

Build spark without hive issue, spark-sql doesn't work.

2017-06-17 Thread wuchang
I want to build hive and spark to make my hive based on spark engine. I choose Hive 2.3.0 and Spark 2.0.0, which is claimed to be compatible by hive official document. According to the hive officials document ,I have to build spark without hive profile to avoid the conflict between original

Build spark without hive issue, spark-sql doesn't work.

2017-06-17 Thread wuchang
I want to build hive and spark to make my hive work on spark engine. I choose Hive 2.3.0 and Spark 2.0.0, which is claimed to be compatible by hive official document. According to the hive officials document ,I have to build spark without hive profile to avoid the conflict between original hive

difference between spark-integrated hive and original hive

2017-06-17 Thread wuchang
I want to build hive and spark to make my hive based on spark engine. I choose Hive 2.3.0 and Spark 2.0.0, which is claimed to be compatible by hive official document. According to the hive officials document ,I have to build spark without hive profile to avoid the conflict between original