Brian, I saw that Stuart
here<http://stuartsierra.com/2008/04/24/a-million-little-files>mentions
slow writes to SequenceFile. If so, I will either use his tar
approach or try to parallelize it if I can.

On Tue, Feb 10, 2009 at 11:14 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

>
> On Feb 10, 2009, at 11:09 PM, Mark Kerzner wrote:
>
>  Brian, large files using command-line hadoop go fast, so it is something
>> about my computer or network. I won't worry about this now, especially in
>> light of Amit reporting fast writes and reads.
>>
>
> You're creating files using SequenceFile, right?  It might be that the
> creation of the sequence file is the portion which is slow, not the network
> I/O.
>
> I don't have much knowledge about optimization of SequenceFile creation.  I
> assume that you'll want to start by tweaking compression on and off.
>  Additionally, Jeff (I think) pointed to a Hadoop Archive file, which also
> might be an alternative for your system.  I don't know enough to give you a
> set of pros and cons, just enough to mention it as an alternative to
> experiment with.
>
> Sorry I'm not useful here...
>
> Brian
>
>
>
>>
>> Mark
>>
>> On Tue, Feb 10, 2009 at 5:00 PM, Brian Bockelman <bbock...@cse.unl.edu
>> >wrote:
>>
>>
>>> On Feb 10, 2009, at 4:53 PM, Mark Kerzner wrote:
>>>
>>> Brian, I have a similar question: why does transfer from a local
>>>
>>>> filesystem
>>>> to SequenceFile takes so long (about 1 second per Meg)?
>>>>
>>>>
>>> Hey Mark,
>>>
>>> I saw your question about speed the other day ... unfortunately, I didn't
>>> have any specific advice so I stayed quiet :)
>>>
>>> In a correctly configured cluster, performance is mostly limited by
>>> available hardware.  If it's obvious that performance is well below
>>> hardware
>>> limits (such as in your case), it's usually (a) you're not generating
>>> files
>>> fast enough or (b) something is configured wrong.
>>>
>>> Have you just tried hadoop fs -put .... for some large file hanging
>>> around
>>> locally?  If that doesn't go more than 5MB/s or so (when your hardware
>>> can
>>> obviously do such a rate), then there's probably a configuration issue.
>>>
>>> Brian
>>>
>>>
>>>
>>>  Thank you,
>>>> Mark
>>>>
>>>> On Tue, Feb 10, 2009 at 4:46 PM, Brian Bockelman <bbock...@cse.unl.edu
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>>
>>>>  On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>  Could someone help me to find some real Figures (transfer rate) about
>>>>>> Hadoop File transfer  from local filesystem to HDFS, S3 etc and among
>>>>>> Storage Systems (HDFS to S3 etc)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Wasim
>>>>>>
>>>>>>
>>>>>>  What are you looking for?  Maximum possible transfer rate?  Maximum
>>>>> possible transfer rate per client?  Generally, if you're using the Java
>>>>> client, transfer rate to/from HDFS is limited by the hardware you have
>>>>> and
>>>>> the network connection (if you have 1Gbps per client).
>>>>>
>>>>> I could give you a graph showing a peak of 9Gbps from our Hadoop
>>>>> instance
>>>>> to the WAN, but that's not very interesting if you don't have a 10Gbps
>>>>> pipe...
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Reply via email to