Re: Loading data into HDFS

2007-08-07 Thread Eric Baldeschwieler
system. Thus we have to rotate logfiles at a greater frequency that we'd like to checkpoint the data into HDFS. The system certainly isn't perfect but bulk-loading the data into HDFS was proving rather slow. I'd be curious to hear actual performance numbers and methodologies for bulk loads

Re: Loading data into HDFS

2007-08-07 Thread Jim Kellerman
This request isn't so much about loading data into HDFS, but we really need the ability to create a file that supports atomic appends for the HBase redo log. Since HDFS files currently don't exist until they are closed, the best we can do right now is close the current redo log and open a new one

Re: Loading data into HDFS

2007-08-07 Thread Ted Dunning
certainly isn't perfect but bulk-loading the data into HDFS was proving rather slow. I'd be curious to hear actual performance numbers and methodologies for bulk loads. I'll try to dig some up myself on Monday. On 8/2/07, Dennis Kubes [EMAIL PROTECTED] wrote: You can copy data from any

RE: Loading data into HDFS

2007-08-07 Thread Runping Qi
Hadoop Aggregate package (o.a.h.mapred.lib.aggregate) is a good fit for your aggregation problem. Runping -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 07, 2007 12:09 PM To: hadoop-user@lucene.apache.org Subject: Re: Loading data into HDFS

Re: Loading data into HDFS

2007-08-03 Thread Venkates .P.B.
Am I missing something very fundamental ? Can someone comment on these queries ? Thanks, Venkates P B On 8/1/07, Venkates .P.B. [EMAIL PROTECTED] wrote: Few queries regarding the way data is loaded into HDFS. -Is it a common practice to load the data into HDFS only through the master node

Re: Loading data into HDFS

2007-08-03 Thread Dmitry
thanks, DT www.ejinz.com Search News - Original Message - From: Venkates .P.B. [EMAIL PROTECTED] To: hadoop-user@lucene.apache.org Sent: Friday, August 03, 2007 1:41 AM Subject: Re: Loading data into HDFS Am I missing something very fundamental ? Can someone comment on these queries

Re: Loading data into HDFS

2007-08-03 Thread Dennis Kubes
You can copy data from any node, so if you can do it from multiple nodes your performance would be better (although be sure not to overlap files). The master node is updated once a the block is copied it replication number of times. So if default replication is 3 then the 3 replicates must

Loading data into HDFS

2007-08-01 Thread Venkates .P.B.
Few queries regarding the way data is loaded into HDFS. -Is it a common practice to load the data into HDFS only through the master node ? We are able to copy only around 35 logs (64K each) per minute in a 2 slave configuration. -We are concerned about time it would take to update filenames and