Re: BigDecimal problem in parquet file

2015-06-18 Thread Bipin Nag
writing wide tables. Cheng On 6/15/15 5:48 AM, Bipin Nag wrote: HI Davies, I have tried recent 1.4 and 1.5-snapshot to 1) open the parquet and save it again or 2 apply schema to rdd and save dataframe as parquet but now I get this error (right in the beginning

Re: BigDecimal problem in parquet file

2015-06-15 Thread Bipin Nag
recently. On Fri, Jun 12, 2015 at 5:38 AM, Bipin Nag bipin@gmail.com wrote: Hi Cheng, Yes, some rows contain unit instead of decimal values. I believe some rows from original source I had don't have any value i.e. it is null. And that shows up as unit. How does the spark-sql or parquet

Re: BigDecimal problem in parquet file

2015-06-12 Thread Bipin Nag
to change it properly. Thanks for helping out. Bipin On 12 June 2015 at 14:57, Cheng Lian lian.cs@gmail.com wrote: On 6/10/15 8:53 PM, Bipin Nag wrote: Hi Cheng, I am using Spark 1.3.1 binary available for Hadoop 2.6. I am loading an existing parquet file, then repartitioning

Re: BigDecimal problem in parquet file

2015-06-10 Thread Bipin Nag
Hi Cheng, I am using Spark 1.3.1 binary available for Hadoop 2.6. I am loading an existing parquet file, then repartitioning and saving it. Doing this gives the error. The code for this doesn't look like causing problem. I have a feeling the source - the existing parquet is the culprit. I

Re: Error in using saveAsParquetFile

2015-06-09 Thread Bipin Nag
@gmail.com wrote: I suspect that Bookings and Customerdetails both have a PolicyType field, one is string and the other is an int. Cheng On 6/8/15 9:15 PM, Bipin Nag wrote: Hi Jeetendra, Cheng I am using following code for joining val Bookings = sqlContext.load(/home/administrator

Re: Error in using saveAsParquetFile

2015-06-08 Thread Bipin Nag
Hi Jeetendra, Cheng I am using following code for joining val Bookings = sqlContext.load(/home/administrator/stageddata/Bookings) val Customerdetails = sqlContext.load(/home/administrator/stageddata/Customerdetails) val CD = Customerdetails. where($CreatedOn 2015-04-01 00:00:00.0).

Re: How to group multiple row data ?

2015-05-01 Thread Bipin Nag
OK, consider the case where there are multiple event triggers for a given customer/ vendor/product like 1,1,2,2,3 arranged in the order of *event* *occurrence* (time stamp). So output should be two groups (1,2) and (1,2,3). The doublet would be first occurrence of 1,2 and triplet later occurrences

Re: Microsoft SQL jdbc support from spark sql

2015-04-07 Thread Bipin Nag
Thanks for the information. Hopefully this will happen in near future. For now my best bet would be to export data and import it in spark sql. On 7 April 2015 at 11:28, Denny Lee denny.g@gmail.com wrote: At this time, the JDBC Data source is not extensible so it cannot support SQL Server.