Sorry! My bad. I had stale spark jars sitting on the slave nodes... Alex
On Tue, Dec 30, 2014 at 4:39 PM, Alessandro Baretta <alexbare...@gmail.com> wrote: > Gents, > > I tried #3820. It doesn't work. I'm still getting the following exceptions: > > Exception in thread "Thread-45" java.lang.RuntimeException: Unsupported > datatype DateType > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$fromDataType$2.apply(ParquetTypes.scala:343) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$fromDataType$2.apply(ParquetTypes.scala:292) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:291) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$4.apply(ParquetTypes.scala:363) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$4.apply(ParquetTypes.scala:362) > > I would more than happy to fix this myself, but I would need some help > wading through the code. Could anyone explain to me what exactly is needed > to support a new data type in SparkSQL's Parquet storage engine? > > Thanks. > > Alex > > On Mon, Dec 29, 2014 at 10:20 PM, Wang, Daoyuan <daoyuan.w...@intel.com> > wrote: > >> By adding a flag in SQLContext, I have modified #3822 to include >> nanoseconds now. Since passing too many flags is ugly, now I need the whole >> SQLContext, so that we can put more flags there. >> >> >> >> Thanks, >> >> Daoyuan >> >> >> >> *From:* Michael Armbrust [mailto:mich...@databricks.com] >> *Sent:* Tuesday, December 30, 2014 10:43 AM >> *To:* Alessandro Baretta >> *Cc:* Wang, Daoyuan; dev@spark.apache.org >> *Subject:* Re: Unsupported Catalyst types in Parquet >> >> >> >> Yeah, I saw those. The problem is that #3822 truncates timestamps that >> include nanoseconds. >> >> >> >> On Mon, Dec 29, 2014 at 5:14 PM, Alessandro Baretta < >> alexbare...@gmail.com> wrote: >> >> Michael, >> >> >> >> Actually, Adrian Wang already created pull requests for these issues. >> >> >> >> https://github.com/apache/spark/pull/3820 >> >> https://github.com/apache/spark/pull/3822 >> >> >> >> What do you think? >> >> >> >> Alex >> >> >> >> On Mon, Dec 29, 2014 at 3:07 PM, Michael Armbrust <mich...@databricks.com> >> wrote: >> >> I'd love to get both of these in. There is some trickiness that I talk >> about on the JIRA for timestamps since the SQL timestamp class can support >> nano seconds and I don't think parquet has a type for this. Other systems >> (impala) seem to use INT96. It would be great to maybe ask on the parquet >> mailing list what the plan is there to make sure that whatever we do is >> going to be compatible long term. >> >> >> >> Michael >> >> >> >> On Mon, Dec 29, 2014 at 8:13 AM, Alessandro Baretta < >> alexbare...@gmail.com> wrote: >> >> Daoyuan, >> >> Thanks for creating the jiras. I need these features by... last week, so >> I'd be happy to take care of this myself, if only you or someone more >> experienced than me in the SparkSQL codebase could provide some guidance. >> >> Alex >> >> On Dec 29, 2014 12:06 AM, "Wang, Daoyuan" <daoyuan.w...@intel.com> wrote: >> >> Hi Alex, >> >> I'll create JIRA SPARK-4985 for date type support in parquet, and >> SPARK-4987 for timestamp type support. For decimal type, I think we only >> support decimals that fits in a long. >> >> Thanks, >> Daoyuan >> >> -----Original Message----- >> From: Alessandro Baretta [mailto:alexbare...@gmail.com] >> Sent: Saturday, December 27, 2014 2:47 PM >> To: dev@spark.apache.org; Michael Armbrust >> Subject: Unsupported Catalyst types in Parquet >> >> Michael, >> >> I'm having trouble storing my SchemaRDDs in Parquet format with SparkSQL, >> due to my RDDs having having DateType and DecimalType fields. What would it >> take to add Parquet support for these Catalyst? Are there any other >> Catalyst types for which there is no Catalyst support? >> >> Alex >> >> >> >> >> >> >> > >