Re: SQLcontext changing String field to Long

2015-10-12 Thread shobhit gupta
Great, that helped a lot, issue is fixed now. :)

Thank you very much!

On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

>  In our case, we do not actually need partition inference so the
> workaround was easy -- instead of using the path as
> rootpath/batch_id=333/... we changed the paths to rootpath/333/ This
> works for us because we compute the set of HDFS paths manually and then
> register a dataframe into a SQLContext.
>
> But it seems like there is a nicer solution:
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
>
> Notice that the data types of the partitioning columns are automatically 
> inferred. Currently, numeric data types and string type are supported. 
> Sometimes users may not want to automatically infer the data types of the 
> partitioning columns. For these use cases, the automatic type inference can 
> be configured by spark.sql.sources.partitionColumnTypeInference.enabled, 
> which is default to true. When type inference is disabled, string type will 
> be used for the partitioning columns
>
> ​
>
> On Sat, Oct 10, 2015 at 9:52 PM, shobhit gupta <smartsho...@gmail.com>
> wrote:
>
>> here is what the df.schema.toString() prints.
>>
>> DF Schema is ::StructType(StructField(batch_id,StringType,true))
>>
>> I think you nailed the problem, this filed is the part of our hdfs file
>> path. We have kind of partitioned our data on the basis of batch_ids folder.
>>
>> How did you get around it?
>>
>> Thanks for help. :)
>>
>> On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com>
>> wrote:
>>
>>> can you show the output of df.printSchema? Just a guess but I think I
>>> ran into something similar with a column that was part of a path in
>>> parquet. E.g. we had an account_id in the parquet file data itself which
>>> was of type string but we also named the files in the following manner
>>> /somepath/account_id=.../file.parquet. Since Spark uses the paths for
>>> partition discovery, it was actually inferring that account_id is a numeric
>>> type and upon reading the data, we ran into the exception you're describing
>>> (this is in Spark 1.4)..
>>>
>>> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I have saved my records in to parquet format and am using Spark1.5. But
>>>> when
>>>> I try to fetch the columns it throws exception*
>>>> java.lang.ClassCastException: java.lang.Long cannot be cast to
>>>> org.apache.spark.unsafe.types.UTF8String*.
>>>>
>>>> This filed is saved as String while writing parquet. so here is the
>>>> sample
>>>> code and output for the same..
>>>>
>>>> logger.info("troubling thing is ::" +
>>>> sqlContext.sql(fileSelectQuery).schema().toString());
>>>> DataFrame df= sqlContext.sql(fileSelectQuery);
>>>> JavaRDD rdd2 = df.toJavaRDD();
>>>>
>>>> First Line in the code (Logger) prints this:
>>>> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>>>>
>>>> But the moment after it the execption comes up.
>>>>
>>>> Any idea why it is treating the filed as Long? (yeah one unique thing
>>>> about
>>>> column is it is always a number e.g. Time-stamp).
>>>>
>>>> Any help is appreciated.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>>
>>
>> *Regards , Shobhit Gupta.*
>> *"If you salute your job, you have to salute nobody. But if you pollute
>> your job, you have to salute everybody..!!"*
>>
>
>


-- 




*Regards , Shobhit Gupta.*
*"If you salute your job, you have to salute nobody. But if you pollute
your job, you have to salute everybody..!!"*


Re: SQLcontext changing String field to Long

2015-10-10 Thread Yana Kadiyska
can you show the output of df.printSchema? Just a guess but I think I ran
into something similar with a column that was part of a path in parquet.
E.g. we had an account_id in the parquet file data itself which was of type
string but we also named the files in the following manner
/somepath/account_id=.../file.parquet. Since Spark uses the paths for
partition discovery, it was actually inferring that account_id is a numeric
type and upon reading the data, we ran into the exception you're describing
(this is in Spark 1.4)..

On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote:

> Hi there,
>
> I have saved my records in to parquet format and am using Spark1.5. But
> when
> I try to fetch the columns it throws exception*
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> org.apache.spark.unsafe.types.UTF8String*.
>
> This filed is saved as String while writing parquet. so here is the sample
> code and output for the same..
>
> logger.info("troubling thing is ::" +
> sqlContext.sql(fileSelectQuery).schema().toString());
> DataFrame df= sqlContext.sql(fileSelectQuery);
> JavaRDD rdd2 = df.toJavaRDD();
>
> First Line in the code (Logger) prints this:
> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>
> But the moment after it the execption comes up.
>
> Any idea why it is treating the filed as Long? (yeah one unique thing about
> column is it is always a number e.g. Time-stamp).
>
> Any help is appreciated.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: SQLcontext changing String field to Long

2015-10-10 Thread shobhit gupta
here is what the df.schema.toString() prints.

DF Schema is ::StructType(StructField(batch_id,StringType,true))

I think you nailed the problem, this filed is the part of our hdfs file
path. We have kind of partitioned our data on the basis of batch_ids folder.

How did you get around it?

Thanks for help. :)

On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

> can you show the output of df.printSchema? Just a guess but I think I ran
> into something similar with a column that was part of a path in parquet.
> E.g. we had an account_id in the parquet file data itself which was of type
> string but we also named the files in the following manner
> /somepath/account_id=.../file.parquet. Since Spark uses the paths for
> partition discovery, it was actually inferring that account_id is a numeric
> type and upon reading the data, we ran into the exception you're describing
> (this is in Spark 1.4)..
>
> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote:
>
>> Hi there,
>>
>> I have saved my records in to parquet format and am using Spark1.5. But
>> when
>> I try to fetch the columns it throws exception*
>> java.lang.ClassCastException: java.lang.Long cannot be cast to
>> org.apache.spark.unsafe.types.UTF8String*.
>>
>> This filed is saved as String while writing parquet. so here is the sample
>> code and output for the same..
>>
>> logger.info("troubling thing is ::" +
>> sqlContext.sql(fileSelectQuery).schema().toString());
>> DataFrame df= sqlContext.sql(fileSelectQuery);
>> JavaRDD rdd2 = df.toJavaRDD();
>>
>> First Line in the code (Logger) prints this:
>> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>>
>> But the moment after it the execption comes up.
>>
>> Any idea why it is treating the filed as Long? (yeah one unique thing
>> about
>> column is it is always a number e.g. Time-stamp).
>>
>> Any help is appreciated.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 




*Regards , Shobhit Gupta.*
*"If you salute your job, you have to salute nobody. But if you pollute
your job, you have to salute everybody..!!"*


SQLcontext changing String field to Long

2015-10-09 Thread Abhisheks
Hi there,

I have saved my records in to parquet format and am using Spark1.5. But when
I try to fetch the columns it throws exception*
java.lang.ClassCastException: java.lang.Long cannot be cast to
org.apache.spark.unsafe.types.UTF8String*.

This filed is saved as String while writing parquet. so here is the sample
code and output for the same..

logger.info("troubling thing is ::" +
sqlContext.sql(fileSelectQuery).schema().toString());
DataFrame df= sqlContext.sql(fileSelectQuery);
JavaRDD rdd2 = df.toJavaRDD();

First Line in the code (Logger) prints this: 
troubling thing is ::StructType(StructField(batch_id,StringType,true))

But the moment after it the execption comes up. 

Any idea why it is treating the filed as Long? (yeah one unique thing about
column is it is always a number e.g. Time-stamp).

Any help is appreciated.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org