Re: Using Spark SQL with multiple (avro) files

2015-01-16 Thread Michael Armbrust
I'd open an issue on the github to ask us to allow you to use hadoops glob file format for the path. On Thu, Jan 15, 2015 at 4:57 AM, David Jones letsnumsperi...@gmail.com wrote: I've tried this now. Spark can load multiple avro files from the same directory by passing a path to a directory.

Re: Using Spark SQL with multiple (avro) files

2015-01-15 Thread David Jones
I've tried this now. Spark can load multiple avro files from the same directory by passing a path to a directory. However, passing multiple paths separated with commas didn't work. Is there any way to load all avro files in multiple directories using sqlContext.avroFile? On Wed, Jan 14, 2015 at

Re: Using Spark SQL with multiple (avro) files

2015-01-14 Thread Yana Kadiyska
If the wildcard path you have doesn't work you should probably open a bug -- I had a similar problem with Parquet and it was a bug which recently got closed. Not sure if sqlContext.avroFile shares a codepath with .parquetFile...you can try running with bits that have the fix for .parquetFile or

Re: Using Spark SQL with multiple (avro) files

2015-01-14 Thread David Jones
Should I be able to pass multiple paths separated by commas? I haven't tried but didn't think it'd work. I'd expected a function that accepted a list of strings. On Wed, Jan 14, 2015 at 3:20 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: If the wildcard path you have doesn't work you should

Using Spark SQL with multiple (avro) files

2015-01-14 Thread David Jones
Hi, I have a program that loads a single avro file using spark SQL, queries it, transforms it and then outputs the data. The file is loaded with: val records = sqlContext.avroFile(filePath) val data = records.registerTempTable(data) ... Now I want to run it over tens of thousands of Avro files