Re: Wildcard support in input path

Jianshi Huang Tue, 17 Jun 2014 23:27:08 -0700

Hi all,

Thanks for the reply. I'm using parquetFile as input, is that a problem? In
hadoop fs -ls, the path
(hdfs://domain/user/jianshuang/data/parquet/table/month=2014*)
will get list all the files.


I'll test it again.

Jianshi


On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Hi Andrew,
>
> Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it says
> file not found. I'll try again.
>
> Jianshi
>
>
> On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash <and...@andrewash.com> wrote:
>
>> In Spark you can use the normal globs supported by Hadoop's FileSystem,
>> which are documented here:
>> http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
>>
>>
>> On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW <meethu2...@yahoo.co.in>
>> wrote:
>>
>>> Hi Jianshi,
>>>
>>> I have used wild card characters (*) in my program and it worked..
>>> My code was like this
>>> b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*")
>>>
>>> Thanks & Regards,
>>> Meethu M
>>>
>>>
>>>   On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang <
>>> jianshi.hu...@gmail.com> wrote:
>>>
>>>
>>>  It would be convenient if Spark's textFile, parquetFile, etc. can
>>> support path with wildcard, such as:
>>>
>>>   hdfs://domain/user/jianshuang/data/parquet/table/month=2014*
>>>
>>>  Or is there already a way to do it now?
>>>
>>> Jianshi
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>>
>>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Wildcard support in input path

Reply via email to