On Sep 27, 2010, at 13:46 , Edward Capriolo wrote:

> On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <[email protected]> wrote:
>> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
>> 
>>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[email protected]>
>>> wrote:
>>>> 
>>>> Is there a particularly good reason for why the "hadoop fs" command
>>>> supports
>>>> -cat and -tail, but not -head?
>>>> 
>>> 
>>> Tail is needed to be done efficiently but head you can just do
>>> yourself. Most people probably use
>>> 
>>> hadoop dfs -cat file | head -5.
>> 
>> 
>> I disagree with your use of the word "efficiently".  :-)  To my
>> understanding (and perhaps that's the source of my error), the approach you
>> suggested reads the entire file over the net from the cluster to your client
>> machine.  That file could conceivably be of HDFS scales (100s of GBs, even
>> TBs wouldn't be uncommon).
>> 
>> What do you think?  Am I wrong in my interpretation of how
>> hadoopCat-pipe-head would work?
>> 
> 'hadoop dfs -cat' will output the file as it is read. head -5 will
> kill the first half of the pipe after 5 lines. With buffering more
> might be physically read then 5 lines but this invocation does not
> read the enter HDFS file before piping it to head.


Excellent.  Thank you.

________________________________________________________________________________
Keith Wiley               [email protected]               www.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
  -- Abe (Grandpa) Simpson
________________________________________________________________________________



Reply via email to