Re: [Spark SQL]: How to read Hive tables with Sub directories - is this supported?

2018-06-20 Thread Daniel Pires
Thanks for coming back with the solution! Sorry my suggestion did not help Daniel On Wed, 20 Jun 2018, 21:46 mattl156, wrote: > Alright so I figured it out. > > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive

Re: [Spark SQL]: How to read Hive tables with Sub directories - is this supported?

2018-06-20 Thread Daniel Pires
Hi Matt, What I tend to do is partition by date in the following way: s3://data-lake/pipeline1/extract_year=2018/extract_month=06/extract_day=20/file1.json See the pattern is key=value for physical partitions When you read that like this: spark.read.json("s3://data-lake/pipeline1/") It will

[Spark-sql Dataset] .as[SomeClass] not modifying Physical Plan

2018-06-17 Thread Daniel Pires
Hi everyone, I am trying to understand the behaviour of .as[SomeClass] (Dataset API): Say I have a file with Users: case class User(id: Int, name: String, address: String, date_add: java.sql.Date) val users = sc.parallelize(Stream.fill(100)(User(0, "test", "Test Street", new java.sql.Date(0,