Hello Olivier

It works like a charm :)

While we are on the subject, I've sent an email to
common-u...@hadoop.apache.org about hdfs that remained unanswered. I'll
reproduce that here, I think it's a better place for it:

I want to achieve the 'hadoop dfs -getmerge' functionality over http. The
closest I could find is the 'Download this file' link but this is available
only for parts, not the whole directory (
http://hadoop:50075/streamFile?filename=%2Fuser%2Fhadoop-user%2Foutput%2Fsolr%2F%2Fpart-00000
)

It seems that you can push to Solr 1.4 a csv url file. That is a link to the
actual csv file. The problem is that a directory is not available for
download as a merged file, in the hadoop hdfs over http interface, just the
individual parts.

As all the pieces are already there, it doesn't make sense to me to add a
http (Apache?) server to this mix just to serve the processed files. I
should be able to do that with a special url or something, maybe along the
lines of ... /streamMergedFile?whateverPathToAFileOrDir

As you can see it's related to my initial question on this thread :)

thanks for your time,
alex

On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier <var...@echo.fr> wrote:

>
> Supposing you do have your part-r-XXXX fully ordered
>
> you can do
>
> hadoop dfs -cat "output/solr/part-*" > yourLocalFile
>
> tada :)
>
> Cheers
>
> Olivier
>
>
> Alex Parvulescu a écrit :
>
>> Hello,
>>
>> one minor correction.
>>
>> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
>> equivalent of '-get' and they both handle only files.
>>
>> I'd like to see an equivalent of 'getmerge' to stdout.
>>
>> sorry for the confusion
>> alex
>>
>> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <
>> alex.parvule...@gmail.com <mailto:alex.parvule...@gmail.com>> wrote:
>>
>>    Hello Olivier,
>>
>>    I've tried 'cat'. This is the error I get: 'cat: Source must be a
>> file.'
>>    This happens when I try to get all parts from a directory as a
>>    single .csv file.
>>
>>    Something like that:
>>      hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
>>      cat: Source must be a file.
>>         This is what the dir looks like
>>      hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
>>      Found 3 items
>>      drwxr-xr-x   - hadoop supergroup          0 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/_logs
>>      -rw-r--r--   2 hadoop supergroup   64882566 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/part-00000
>>      -rw-r--r--   2 hadoop supergroup   51388943 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/part-00001
>>
>>    It seems -get can merge everything to one file, but cannot output to
>>    sdtout while 'cat' can do stdout, but it seems I have to fetch the
>>    parts one by one.
>>
>>    Or am I missing something?
>>
>>    thanks,
>>    alex
>>
>>
>>    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <var...@echo.fr
>>    <mailto:var...@echo.fr>> wrote:
>>
>>        Hello Alex,
>>
>>        get writes down a file on your FileSystem
>>
>>        hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>>
>>        with
>>         src : your file in your hdfs
>>         localdst : the name of the file with the collected data (from
>>        src) on
>>            your local filesystem
>>
>>
>>        To get the results to STDOUT,
>>        you can use cat
>>
>>        hadoop dfs [-cat <src>]
>>
>>        with src : your file in your hdfs
>>
>>        Regards
>>        Olivier
>>
>>        Alex Parvulescu a écrit :
>>
>>            Hello,
>>
>>            Is there a reason for which 'hadoop dfs -get' will not
>>            output to stdout?
>>
>>            I see 'hadoop dfs -put' can handle stdin.  It would seem
>>            that dfs would have to also support outputing to stdout.
>>
>>
>>            thanks,
>>            alex
>>
>>
>>
>>
>>
>>

Reply via email to