Hello Olivier It works like a charm :)
While we are on the subject, I've sent an email to common-u...@hadoop.apache.org about hdfs that remained unanswered. I'll reproduce that here, I think it's a better place for it: I want to achieve the 'hadoop dfs -getmerge' functionality over http. The closest I could find is the 'Download this file' link but this is available only for parts, not the whole directory ( http://hadoop:50075/streamFile?filename=%2Fuser%2Fhadoop-user%2Foutput%2Fsolr%2F%2Fpart-00000 ) It seems that you can push to Solr 1.4 a csv url file. That is a link to the actual csv file. The problem is that a directory is not available for download as a merged file, in the hadoop hdfs over http interface, just the individual parts. As all the pieces are already there, it doesn't make sense to me to add a http (Apache?) server to this mix just to serve the processed files. I should be able to do that with a special url or something, maybe along the lines of ... /streamMergedFile?whateverPathToAFileOrDir As you can see it's related to my initial question on this thread :) thanks for your time, alex On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier <var...@echo.fr> wrote: > > Supposing you do have your part-r-XXXX fully ordered > > you can do > > hadoop dfs -cat "output/solr/part-*" > yourLocalFile > > tada :) > > Cheers > > Olivier > > > Alex Parvulescu a écrit : > >> Hello, >> >> one minor correction. >> >> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the >> equivalent of '-get' and they both handle only files. >> >> I'd like to see an equivalent of 'getmerge' to stdout. >> >> sorry for the confusion >> alex >> >> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu < >> alex.parvule...@gmail.com <mailto:alex.parvule...@gmail.com>> wrote: >> >> Hello Olivier, >> >> I've tried 'cat'. This is the error I get: 'cat: Source must be a >> file.' >> This happens when I try to get all parts from a directory as a >> single .csv file. >> >> Something like that: >> hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/ >> cat: Source must be a file. >> This is what the dir looks like >> hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/ >> Found 3 items >> drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36 >> /user/hadoop-user/output/solr/_logs >> -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36 >> /user/hadoop-user/output/solr/part-00000 >> -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36 >> /user/hadoop-user/output/solr/part-00001 >> >> It seems -get can merge everything to one file, but cannot output to >> sdtout while 'cat' can do stdout, but it seems I have to fetch the >> parts one by one. >> >> Or am I missing something? >> >> thanks, >> alex >> >> >> On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <var...@echo.fr >> <mailto:var...@echo.fr>> wrote: >> >> Hello Alex, >> >> get writes down a file on your FileSystem >> >> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>] >> >> with >> src : your file in your hdfs >> localdst : the name of the file with the collected data (from >> src) on >> your local filesystem >> >> >> To get the results to STDOUT, >> you can use cat >> >> hadoop dfs [-cat <src>] >> >> with src : your file in your hdfs >> >> Regards >> Olivier >> >> Alex Parvulescu a écrit : >> >> Hello, >> >> Is there a reason for which 'hadoop dfs -get' will not >> output to stdout? >> >> I see 'hadoop dfs -put' can handle stdin. It would seem >> that dfs would have to also support outputing to stdout. >> >> >> thanks, >> alex >> >> >> >> >> >>