Re: SQLcontext changing String field to Long
Great, that helped a lot, issue is fixed now. :) Thank you very much! On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > In our case, we do not actually need partition inference so the > workaround was easy -- instead of using the path as > rootpath/batch_id=333/... we changed the paths to rootpath/333/ This > works for us because we compute the set of HDFS paths manually and then > register a dataframe into a SQLContext. > > But it seems like there is a nicer solution: > > http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery > > Notice that the data types of the partitioning columns are automatically > inferred. Currently, numeric data types and string type are supported. > Sometimes users may not want to automatically infer the data types of the > partitioning columns. For these use cases, the automatic type inference can > be configured by spark.sql.sources.partitionColumnTypeInference.enabled, > which is default to true. When type inference is disabled, string type will > be used for the partitioning columns > > > > On Sat, Oct 10, 2015 at 9:52 PM, shobhit gupta <smartsho...@gmail.com> > wrote: > >> here is what the df.schema.toString() prints. >> >> DF Schema is ::StructType(StructField(batch_id,StringType,true)) >> >> I think you nailed the problem, this filed is the part of our hdfs file >> path. We have kind of partitioned our data on the basis of batch_ids folder. >> >> How did you get around it? >> >> Thanks for help. :) >> >> On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com> >> wrote: >> >>> can you show the output of df.printSchema? Just a guess but I think I >>> ran into something similar with a column that was part of a path in >>> parquet. E.g. we had an account_id in the parquet file data itself which >>> was of type string but we also named the files in the following manner >>> /somepath/account_id=.../file.parquet. Since Spark uses the paths for >>> partition discovery, it was actually inferring that account_id is a numeric >>> type and upon reading the data, we ran into the exception you're describing >>> (this is in Spark 1.4).. >>> >>> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: >>> >>>> Hi there, >>>> >>>> I have saved my records in to parquet format and am using Spark1.5. But >>>> when >>>> I try to fetch the columns it throws exception* >>>> java.lang.ClassCastException: java.lang.Long cannot be cast to >>>> org.apache.spark.unsafe.types.UTF8String*. >>>> >>>> This filed is saved as String while writing parquet. so here is the >>>> sample >>>> code and output for the same.. >>>> >>>> logger.info("troubling thing is ::" + >>>> sqlContext.sql(fileSelectQuery).schema().toString()); >>>> DataFrame df= sqlContext.sql(fileSelectQuery); >>>> JavaRDD rdd2 = df.toJavaRDD(); >>>> >>>> First Line in the code (Logger) prints this: >>>> troubling thing is ::StructType(StructField(batch_id,StringType,true)) >>>> >>>> But the moment after it the execption comes up. >>>> >>>> Any idea why it is treating the filed as Long? (yeah one unique thing >>>> about >>>> column is it is always a number e.g. Time-stamp). >>>> >>>> Any help is appreciated. >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >> >> -- >> >> >> >> >> *Regards , Shobhit Gupta.* >> *"If you salute your job, you have to salute nobody. But if you pollute >> your job, you have to salute everybody..!!"* >> > > -- *Regards , Shobhit Gupta.* *"If you salute your job, you have to salute nobody. But if you pollute your job, you have to salute everybody..!!"*
Re: SQLcontext changing String field to Long
can you show the output of df.printSchema? Just a guess but I think I ran into something similar with a column that was part of a path in parquet. E.g. we had an account_id in the parquet file data itself which was of type string but we also named the files in the following manner /somepath/account_id=.../file.parquet. Since Spark uses the paths for partition discovery, it was actually inferring that account_id is a numeric type and upon reading the data, we ran into the exception you're describing (this is in Spark 1.4).. On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: > Hi there, > > I have saved my records in to parquet format and am using Spark1.5. But > when > I try to fetch the columns it throws exception* > java.lang.ClassCastException: java.lang.Long cannot be cast to > org.apache.spark.unsafe.types.UTF8String*. > > This filed is saved as String while writing parquet. so here is the sample > code and output for the same.. > > logger.info("troubling thing is ::" + > sqlContext.sql(fileSelectQuery).schema().toString()); > DataFrame df= sqlContext.sql(fileSelectQuery); > JavaRDD rdd2 = df.toJavaRDD(); > > First Line in the code (Logger) prints this: > troubling thing is ::StructType(StructField(batch_id,StringType,true)) > > But the moment after it the execption comes up. > > Any idea why it is treating the filed as Long? (yeah one unique thing about > column is it is always a number e.g. Time-stamp). > > Any help is appreciated. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: SQLcontext changing String field to Long
here is what the df.schema.toString() prints. DF Schema is ::StructType(StructField(batch_id,StringType,true)) I think you nailed the problem, this filed is the part of our hdfs file path. We have kind of partitioned our data on the basis of batch_ids folder. How did you get around it? Thanks for help. :) On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > can you show the output of df.printSchema? Just a guess but I think I ran > into something similar with a column that was part of a path in parquet. > E.g. we had an account_id in the parquet file data itself which was of type > string but we also named the files in the following manner > /somepath/account_id=.../file.parquet. Since Spark uses the paths for > partition discovery, it was actually inferring that account_id is a numeric > type and upon reading the data, we ran into the exception you're describing > (this is in Spark 1.4).. > > On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: > >> Hi there, >> >> I have saved my records in to parquet format and am using Spark1.5. But >> when >> I try to fetch the columns it throws exception* >> java.lang.ClassCastException: java.lang.Long cannot be cast to >> org.apache.spark.unsafe.types.UTF8String*. >> >> This filed is saved as String while writing parquet. so here is the sample >> code and output for the same.. >> >> logger.info("troubling thing is ::" + >> sqlContext.sql(fileSelectQuery).schema().toString()); >> DataFrame df= sqlContext.sql(fileSelectQuery); >> JavaRDD rdd2 = df.toJavaRDD(); >> >> First Line in the code (Logger) prints this: >> troubling thing is ::StructType(StructField(batch_id,StringType,true)) >> >> But the moment after it the execption comes up. >> >> Any idea why it is treating the filed as Long? (yeah one unique thing >> about >> column is it is always a number e.g. Time-stamp). >> >> Any help is appreciated. >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- *Regards , Shobhit Gupta.* *"If you salute your job, you have to salute nobody. But if you pollute your job, you have to salute everybody..!!"*
SQLcontext changing String field to Long
Hi there, I have saved my records in to parquet format and am using Spark1.5. But when I try to fetch the columns it throws exception* java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.spark.unsafe.types.UTF8String*. This filed is saved as String while writing parquet. so here is the sample code and output for the same.. logger.info("troubling thing is ::" + sqlContext.sql(fileSelectQuery).schema().toString()); DataFrame df= sqlContext.sql(fileSelectQuery); JavaRDD rdd2 = df.toJavaRDD(); First Line in the code (Logger) prints this: troubling thing is ::StructType(StructField(batch_id,StringType,true)) But the moment after it the execption comes up. Any idea why it is treating the filed as Long? (yeah one unique thing about column is it is always a number e.g. Time-stamp). Any help is appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org