Re: Spark -- Writing to Partitioned Persistent Table

2015-10-30 Thread Bryan Jeffrey
n Jeffrey >> -- >> From: Yana Kadiyska <yana.kadiy...@gmail.com> >> Sent: ‎10/‎28/‎2015 8:32 PM >> To: Bryan Jeffrey <bryan.jeff...@gmail.com> >> Cc: Susan Zhang <suchenz...@gmail.com>; user <user@spark.apache.org> >

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-29 Thread Deenar Toraskar
adiyska <yana.kadiy...@gmail.com> > Sent: ‎10/‎28/‎2015 8:32 PM > To: Bryan Jeffrey <bryan.jeff...@gmail.com> > Cc: Susan Zhang <suchenz...@gmail.com>; user <user@spark.apache.org> > Subject: Re: Spark -- Writing to Partitioned Persistent Table > > For this i

Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
Hello. I am working to get a simple solution working using Spark SQL. I am writing streaming data to persistent tables using a HiveContext. Writing to a persistent non-partitioned table works well - I update the table using Spark streaming, and the output is available via Hive Thrift/JDBC. I

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Susan Zhang
Have you tried partitionBy? Something like hiveWindowsEvents.foreachRDD( rdd => { val eventsDataFrame = rdd.toDF() eventsDataFrame.write.mode(SaveMode.Append).partitionBy(" windows_event_time_bin").saveAsTable("windows_event") }) On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
Susan, I did give that a shot -- I'm seeing a number of oddities: (1) 'Partition By' appears only accepts alphanumeric lower case fields. It will work for 'machinename', but not 'machineName' or 'machine_name'. (2) When partitioning with maps included in the data I get odd string conversion

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
All, One issue I'm seeing is that I start the thrift server (for jdbc access) via the following: /spark/spark-1.4.1/sbin/start-thriftserver.sh --master spark://master:7077 --hiveconf "spark.cores.max=2" After about 40 seconds the Thrift server is started and available on default port 1. I

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
The second issue I'm seeing is an OOM issue when writing partitioned data. I am running Spark 1.4.1, Scala 2.11, Hadoop 2.6.1 & using the Hive libraries packaged with Spark. Spark was compiled using the following: mvn -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Jerry Lam
Hi Bryan, Did you read the email I sent few days ago. There are more issues with partitionBy down the road: https://www.mail-archive.com/user@spark.apache.org/msg39512.html Best Regards, Jerry > On Oct 28, 2015, at 4:52 PM,

RE: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan
Jeffrey" <bryan.jeff...@gmail.com> Cc: "Susan Zhang" <suchenz...@gmail.com>; "user" <user@spark.apache.org> Subject: Re: Spark -- Writing to Partitioned Persistent Table Hi Bryan, Did you read the email I sent few days ago. There are more issues wit

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Jerry Lam
usan Zhang; user > Subject: Re: Spark -- Writing to Partitioned Persistent Table > > Hi Bryan, > > Did you read the email I sent few days ago. There are more issues with > partitionBy down the road: > https://www.mail-archive.com/user@spark.apache.org/msg39512.html > >

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Yana Kadiyska
For this issue in particular ( ERROR XSDB6: Another instance of Derby may have already booted the database /spark/spark-1.4.1/metastore_db) -- I think it depends on where you start your application and HiveThriftserver from. I've run into a similar issue running a driver app first, which would

RE: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan
.@gmail.com> Cc: "Susan Zhang" <suchenz...@gmail.com>; "user" <user@spark.apache.org> Subject: Re: Spark -- Writing to Partitioned Persistent Table For this issue in particular ( ERROR XSDB6: Another instance of Derby may have already booted the database /spark/spark-1.