Re: Wildcard support in input path

2014-06-18 Thread Jianshi Huang
Hi Andrew, Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it says file not found. I'll try again. Jianshi On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash and...@andrewash.com wrote: In Spark you can use the normal globs supported by Hadoop's FileSystem, which are documented

Re: Wildcard support in input path

2014-06-18 Thread Jianshi Huang
Hi all, Thanks for the reply. I'm using parquetFile as input, is that a problem? In hadoop fs -ls, the path (hdfs://domain/user/jianshuang/data/parquet/table/month=2014*) will get list all the files. I'll test it again. Jianshi On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang

Re: Wildcard support in input path

2014-06-18 Thread Nicholas Chammas
Is that month= syntax something special, or do your files actually have that string as part of their name? ​ On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi all, Thanks for the reply. I'm using parquetFile as input, is that a problem? In hadoop fs -ls, the

Re: Wildcard support in input path

2014-06-18 Thread Jianshi Huang
Hi Nicholas, month= is for Hive to auto discover the partitions. It's part of the url of my files. Jianshi On Wed, Jun 18, 2014 at 11:52 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Is that month= syntax something special, or do your files actually have that string as part of

Re: Wildcard support in input path

2014-06-18 Thread Nicholas Chammas
I wonder if that’s the problem. Is there an equivalent hadoop fs -ls command you can run that returns the same files you want but doesn’t have that month= string? ​ On Wed, Jun 18, 2014 at 12:25 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi Nicholas, month= is for Hive to auto discover

Wildcard support in input path

2014-06-17 Thread Jianshi Huang
It would be convenient if Spark's textFile, parquetFile, etc. can support path with wildcard, such as: hdfs://domain/user/jianshuang/data/parquet/table/month=2014* Or is there already a way to do it now? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog:

Re: Wildcard support in input path

2014-06-17 Thread MEETHU MATHEW
Hi Jianshi, I have used wild card characters (*) in my program and it worked.. My code was like this b = sc.textFile(hdfs:///path to file/data_file_2013SEP01*)   Thanks Regards, Meethu M On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: It would be

Re: Wildcard support in input path

2014-06-17 Thread Patrick Wendell
These paths get passed directly to the Hadoop FileSystem API and I think the support globbing out-of-the box. So AFAIK it should just work. On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi Jianshi, I have used wild card characters (*) in my program and it

Re: Wildcard support in input path

2014-06-17 Thread Andrew Ash
In Spark you can use the normal globs supported by Hadoop's FileSystem, which are documented here: http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path) On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: