On Sep 27, 2010, at 13:46 , Edward Capriolo wrote: > On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <[email protected]> wrote: >> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: >> >>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[email protected]> >>> wrote: >>>> >>>> Is there a particularly good reason for why the "hadoop fs" command >>>> supports >>>> -cat and -tail, but not -head? >>>> >>> >>> Tail is needed to be done efficiently but head you can just do >>> yourself. Most people probably use >>> >>> hadoop dfs -cat file | head -5. >> >> >> I disagree with your use of the word "efficiently". :-) To my >> understanding (and perhaps that's the source of my error), the approach you >> suggested reads the entire file over the net from the cluster to your client >> machine. That file could conceivably be of HDFS scales (100s of GBs, even >> TBs wouldn't be uncommon). >> >> What do you think? Am I wrong in my interpretation of how >> hadoopCat-pipe-head would work? >> > 'hadoop dfs -cat' will output the file as it is read. head -5 will > kill the first half of the pipe after 5 lines. With buffering more > might be physically read then 5 lines but this invocation does not > read the enter HDFS file before piping it to head.
Excellent. Thank you. ________________________________________________________________________________ Keith Wiley [email protected] www.keithwiley.com "I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me." -- Abe (Grandpa) Simpson ________________________________________________________________________________
