Re: Announcing CloudBase-1.3.1 release

2009-06-21 Thread Edisonxp



发自我的 iPhone

在 2009-6-21,9:16,Leo Dagum leo_da...@yahoo.com 写到:

Thanks for the kind words.  I'm copying this reply to the cloudbase  
mail list as further discussion is more appropriate there.


The way indexing is currently implemented you need to manually  
update the index as you add more data.  If you have not been doing  
this then that may explain why you are seeing excessive performance  
degradation with greater data.  If that doesn't help, then we'd love  
to work with you to help resolve the issue.


- leo





From: Edisonxp ediso...@gmail.com
To: core-user@hadoop.apache.org core-user@hadoop.apache.org
Sent: Saturday, June 20, 2009 7:21:11 AM
Subject: Re: Announcing CloudBase-1.3.1 release

I've used cloudbase for several months,since version 1.1 .it's a  
good thing,your team did a great job.
These days, I'm annoying with a problem:with the increasing of  
data,the my daily queries become more and more slow.i hope that  
cloudbase should have a 'partition' feature with the creation of  
table.I used to try 'index', but it seems nothing changed.


发自我的 iPhone

在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到:


CloudBase is a data warehouse system for Terabyte  Petabyte scale
analytics. It is built on top of hadoop. It allows you
to query flat files using ANSI SQL.


We have released 1.3.1 version of CloudBase on sourceforge-
https://sourceforge.net/projects/cloudbase

Please give it a try and send us your feedback.

You can follow CloudBase related discussion in the google mail list:

cloudbase-us...@googlegroups.com



Release notes -

New Features:
* CREATE CSV tables - One can create tables on top of data in CSV
(Comma Separated Values) format and query them using SQL. Current
implementation doesn't accept CSV records which span multiple lines.
Data may not be processed correctly if a field contains embedded  
line-

breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc
for detailed specification of the CSV format.

Bug fixes:
* Aggregate function 'AVG' returns the same value as 'SUM' function
* If a query has multiple aliases, only the last alias works
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google  
Groups CloudBase group.

To post to this group, send email to cloudbase-us...@googlegroups.com
To unsubscribe from this group, send email to 
cloudbase-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cloudbase-users?hl=en
-~--~~~~--~~--~--~---


Re: Announcing CloudBase-1.3.1 release

2009-06-21 Thread Edisonxp

 a

发自我的 iPhone

在 2009-6-21,9:16,Leo Dagum leo_da...@yahoo.com 写到:

Thanks for the kind words.  I'm copying this reply to the cloudbase  
mail list as further discussion is more appropriate there.


The way indexing is currently implemented you need to manually  
update the index as you add more data.  If you have not been doing  
this then that may explain why you are seeing excessive performance  
degradation with greater data.  If that doesn't help, then we'd love  
to work with you to help resolve the issue.


- leo





From: Edisonxp ediso...@gmail.com
To: core-user@hadoop.apache.org core-user@hadoop.apache.org
Sent: Saturday, June 20, 2009 7:21:11 AM
Subject: Re: Announcing CloudBase-1.3.1 release

I've used cloudbase for several months,since version 1.1 .it's a  
good thing,your team did a great job.
These days, I'm annoying with a problem:with the increasing of  
data,the my daily queries become more and more slow.i hope that  
cloudbase should have a 'partition' feature with the creation of  
table.I used to try 'index', but it seems nothing changed.


发自我的 iPhone

在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到:


CloudBase is a data warehouse system for Terabyte  Petabyte scale
analytics. It is built on top of hadoop. It allows you
to query flat files using ANSI SQL.


We have released 1.3.1 version of CloudBase on sourceforge-
https://sourceforge.net/projects/cloudbase

Please give it a try and send us your feedback.

You can follow CloudBase related discussion in the google mail list:

cloudbase-us...@googlegroups.com



Release notes -

New Features:
* CREATE CSV tables - One can create tables on top of data in CSV
(Comma Separated Values) format and query them using SQL. Current
implementation doesn't accept CSV records which span multiple lines.
Data may not be processed correctly if a field contains embedded  
line-

breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc
for detailed specification of the CSV format.

Bug fixes:
* Aggregate function 'AVG' returns the same value as 'SUM' function
* If a query has multiple aliases, only the last alias works
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google  
Groups CloudBase group.

To post to this group, send email to cloudbase-us...@googlegroups.com
To unsubscribe from this group, send email to 
cloudbase-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cloudbase-users?hl=en
-~--~~~~--~~--~--~---


Re: Too many open files error, which gets resolved after some time

2009-06-21 Thread Stas Oskin
Hi.

After tracing some more with the lsof utility, and I managed to stop the
growth on the DataNode process, but still have issues with my DFS client.

It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
a small part of the lsof output:

java10508 root  387w  FIFO0,6   6142565 pipe
java10508 root  388r  FIFO0,6   6142565 pipe
java10508 root  389u     0,100  6142566
eventpoll
java10508 root  390u  FIFO0,6   6135311 pipe
java10508 root  391r  FIFO0,6   6135311 pipe
java10508 root  392u     0,100  6135312
eventpoll
java10508 root  393r  FIFO0,6   6148234 pipe
java10508 root  394w  FIFO0,6   6142570 pipe
java10508 root  395r  FIFO0,6   6135857 pipe
java10508 root  396r  FIFO0,6   6142570 pipe
java10508 root  397r     0,100  6142571
eventpoll
java10508 root  398u  FIFO0,6   6135319 pipe
java10508 root  399w  FIFO0,6   6135319 pipe

I'm using FSDataInputStream and FSDataOutputStream, so this might be related
to pipes?

So, my questions are:

1) What happens these pipes/epolls to appear?

2) More important, how I can prevent their accumation and growth?

Thanks in advance!

2009/6/21 Stas Oskin stas.os...@gmail.com

 Hi.

 I have HDFS client and HDFS datanode running on same machine.

 When I'm trying to access a dozen of files at once from the client, several
 times in a row, I'm starting to receive the following errors on client, and
 HDFS browse function.

 HDFS Client: Could not get block locations. Aborting...
 HDFS browse: Too many open files

 I can increase the maximum number of files that can opened, as I have it
 set to the default 1024, but would like to first solve the problem, as
 larger value just means it would run out of files again later on.

 So my questions are:

 1) Does the HDFS datanode keeps any files opened, even after the HDFS
 client have already closed them?

 2) Is it possible to find out, who keeps the opened files - datanode or
 client (so I could pin-point the source of the problem).

 Thanks in advance!



Re: Too many open files error, which gets resolved after some time

2009-06-21 Thread jason hadoop
HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 After tracing some more with the lsof utility, and I managed to stop the
 growth on the DataNode process, but still have issues with my DFS client.

 It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
 a small part of the lsof output:

 java10508 root  387w  FIFO0,6   6142565 pipe
 java10508 root  388r  FIFO0,6   6142565 pipe
 java10508 root  389u     0,100  6142566
 eventpoll
 java10508 root  390u  FIFO0,6   6135311 pipe
 java10508 root  391r  FIFO0,6   6135311 pipe
 java10508 root  392u     0,100  6135312
 eventpoll
 java10508 root  393r  FIFO0,6   6148234 pipe
 java10508 root  394w  FIFO0,6   6142570 pipe
 java10508 root  395r  FIFO0,6   6135857 pipe
 java10508 root  396r  FIFO0,6   6142570 pipe
 java10508 root  397r     0,100  6142571
 eventpoll
 java10508 root  398u  FIFO0,6   6135319 pipe
 java10508 root  399w  FIFO0,6   6135319 pipe

 I'm using FSDataInputStream and FSDataOutputStream, so this might be
 related
 to pipes?

 So, my questions are:

 1) What happens these pipes/epolls to appear?

 2) More important, how I can prevent their accumation and growth?

 Thanks in advance!

 2009/6/21 Stas Oskin stas.os...@gmail.com

  Hi.
 
  I have HDFS client and HDFS datanode running on same machine.
 
  When I'm trying to access a dozen of files at once from the client,
 several
  times in a row, I'm starting to receive the following errors on client,
 and
  HDFS browse function.
 
  HDFS Client: Could not get block locations. Aborting...
  HDFS browse: Too many open files
 
  I can increase the maximum number of files that can opened, as I have it
  set to the default 1024, but would like to first solve the problem, as
  larger value just means it would run out of files again later on.
 
  So my questions are:
 
  1) Does the HDFS datanode keeps any files opened, even after the HDFS
  client have already closed them?
 
  2) Is it possible to find out, who keeps the opened files - datanode or
  client (so I could pin-point the source of the problem).
 
  Thanks in advance!
 




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


RE: Too many open files error, which gets resolved after some time

2009-06-21 Thread Brian.Levine
IMHO, you should never rely on finalizers to release scarce resources since you 
don't know when the finalizer will get called, if ever.

-brian

 

-Original Message-
From: ext jason hadoop [mailto:jason.had...@gmail.com] 
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: Too many open files error, which gets resolved after some time

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 After tracing some more with the lsof utility, and I managed to stop the
 growth on the DataNode process, but still have issues with my DFS client.

 It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
 a small part of the lsof output:

 java10508 root  387w  FIFO0,6   6142565 pipe
 java10508 root  388r  FIFO0,6   6142565 pipe
 java10508 root  389u     0,100  6142566
 eventpoll
 java10508 root  390u  FIFO0,6   6135311 pipe
 java10508 root  391r  FIFO0,6   6135311 pipe
 java10508 root  392u     0,100  6135312
 eventpoll
 java10508 root  393r  FIFO0,6   6148234 pipe
 java10508 root  394w  FIFO0,6   6142570 pipe
 java10508 root  395r  FIFO0,6   6135857 pipe
 java10508 root  396r  FIFO0,6   6142570 pipe
 java10508 root  397r     0,100  6142571
 eventpoll
 java10508 root  398u  FIFO0,6   6135319 pipe
 java10508 root  399w  FIFO0,6   6135319 pipe

 I'm using FSDataInputStream and FSDataOutputStream, so this might be
 related
 to pipes?

 So, my questions are:

 1) What happens these pipes/epolls to appear?

 2) More important, how I can prevent their accumation and growth?

 Thanks in advance!

 2009/6/21 Stas Oskin stas.os...@gmail.com

  Hi.
 
  I have HDFS client and HDFS datanode running on same machine.
 
  When I'm trying to access a dozen of files at once from the client,
 several
  times in a row, I'm starting to receive the following errors on client,
 and
  HDFS browse function.
 
  HDFS Client: Could not get block locations. Aborting...
  HDFS browse: Too many open files
 
  I can increase the maximum number of files that can opened, as I have it
  set to the default 1024, but would like to first solve the problem, as
  larger value just means it would run out of files again later on.
 
  So my questions are:
 
  1) Does the HDFS datanode keeps any files opened, even after the HDFS
  client have already closed them?
 
  2) Is it possible to find out, who keeps the opened files - datanode or
  client (so I could pin-point the source of the problem).
 
  Thanks in advance!
 




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: Too many open files error, which gets resolved after some time

2009-06-21 Thread jason hadoop
Just to be clear, I second Brian's opinion. Relying on finalizes is a very
good way to run out of file descriptors.

On Sun, Jun 21, 2009 at 9:32 AM, brian.lev...@nokia.com wrote:

 IMHO, you should never rely on finalizers to release scarce resources since
 you don't know when the finalizer will get called, if ever.

 -brian



 -Original Message-
 From: ext jason hadoop [mailto:jason.had...@gmail.com]
 Sent: Sunday, June 21, 2009 11:19 AM
 To: core-user@hadoop.apache.org
 Subject: Re: Too many open files error, which gets resolved after some
 time

 HDFS/DFS client uses quite a few file descriptors for each open file.

 Many application developers (but not the hadoop core) rely on the JVM
 finalizer methods to close open files.

 This combination, expecially when many HDFS files are open can result in
 very large demands for file descriptors for Hadoop clients.
 We as a general rule never run a cluster with nofile less that 64k, and for
 larger clusters with demanding applications have had it set 10x higher. I
 also believe there was a set of JVM versions that leaked file descriptors
 used for NIO in the HDFS core. I do not recall the exact details.

 On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote:

  Hi.
 
  After tracing some more with the lsof utility, and I managed to stop the
  growth on the DataNode process, but still have issues with my DFS client.
 
  It seems that my DFS client opens hundreds of pipes and eventpolls. Here
 is
  a small part of the lsof output:
 
  java10508 root  387w  FIFO0,6   6142565 pipe
  java10508 root  388r  FIFO0,6   6142565 pipe
  java10508 root  389u     0,100  6142566
  eventpoll
  java10508 root  390u  FIFO0,6   6135311 pipe
  java10508 root  391r  FIFO0,6   6135311 pipe
  java10508 root  392u     0,100  6135312
  eventpoll
  java10508 root  393r  FIFO0,6   6148234 pipe
  java10508 root  394w  FIFO0,6   6142570 pipe
  java10508 root  395r  FIFO0,6   6135857 pipe
  java10508 root  396r  FIFO0,6   6142570 pipe
  java10508 root  397r     0,100  6142571
  eventpoll
  java10508 root  398u  FIFO0,6   6135319 pipe
  java10508 root  399w  FIFO0,6   6135319 pipe
 
  I'm using FSDataInputStream and FSDataOutputStream, so this might be
  related
  to pipes?
 
  So, my questions are:
 
  1) What happens these pipes/epolls to appear?
 
  2) More important, how I can prevent their accumation and growth?
 
  Thanks in advance!
 
  2009/6/21 Stas Oskin stas.os...@gmail.com
 
   Hi.
  
   I have HDFS client and HDFS datanode running on same machine.
  
   When I'm trying to access a dozen of files at once from the client,
  several
   times in a row, I'm starting to receive the following errors on client,
  and
   HDFS browse function.
  
   HDFS Client: Could not get block locations. Aborting...
   HDFS browse: Too many open files
  
   I can increase the maximum number of files that can opened, as I have
 it
   set to the default 1024, but would like to first solve the problem, as
   larger value just means it would run out of files again later on.
  
   So my questions are:
  
   1) Does the HDFS datanode keeps any files opened, even after the HDFS
   client have already closed them?
  
   2) Is it possible to find out, who keeps the opened files - datanode or
   client (so I could pin-point the source of the problem).
  
   Thanks in advance!
  
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: Too many open files error, which gets resolved after some time

2009-06-21 Thread jason hadoop
Yes.
Otherwise the file descriptors will flow away like water.
I also strongly suggest having at least 64k file descriptors as the open
file limit.

On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Thanks for the advice. So you advice explicitly closing each and every file
 handle that I receive from HDFS?

 Regards.

 2009/6/21 jason hadoop jason.had...@gmail.com

  Just to be clear, I second Brian's opinion. Relying on finalizes is a
 very
  good way to run out of file descriptors.
 
  On Sun, Jun 21, 2009 at 9:32 AM, brian.lev...@nokia.com wrote:
 
   IMHO, you should never rely on finalizers to release scarce resources
  since
   you don't know when the finalizer will get called, if ever.
  
   -brian
  
  
  
   -Original Message-
   From: ext jason hadoop [mailto:jason.had...@gmail.com]
   Sent: Sunday, June 21, 2009 11:19 AM
   To: core-user@hadoop.apache.org
   Subject: Re: Too many open files error, which gets resolved after
 some
   time
  
   HDFS/DFS client uses quite a few file descriptors for each open file.
  
   Many application developers (but not the hadoop core) rely on the JVM
   finalizer methods to close open files.
  
   This combination, expecially when many HDFS files are open can result
 in
   very large demands for file descriptors for Hadoop clients.
   We as a general rule never run a cluster with nofile less that 64k, and
  for
   larger clusters with demanding applications have had it set 10x higher.
 I
   also believe there was a set of JVM versions that leaked file
 descriptors
   used for NIO in the HDFS core. I do not recall the exact details.
  
   On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com
  wrote:
  
Hi.
   
After tracing some more with the lsof utility, and I managed to stop
  the
growth on the DataNode process, but still have issues with my DFS
  client.
   
It seems that my DFS client opens hundreds of pipes and eventpolls.
  Here
   is
a small part of the lsof output:
   
java10508 root  387w  FIFO0,6   6142565
  pipe
java10508 root  388r  FIFO0,6   6142565
  pipe
java10508 root  389u     0,100  6142566
eventpoll
java10508 root  390u  FIFO0,6   6135311
  pipe
java10508 root  391r  FIFO0,6   6135311
  pipe
java10508 root  392u     0,100  6135312
eventpoll
java10508 root  393r  FIFO0,6   6148234
  pipe
java10508 root  394w  FIFO0,6   6142570
  pipe
java10508 root  395r  FIFO0,6   6135857
  pipe
java10508 root  396r  FIFO0,6   6142570
  pipe
java10508 root  397r     0,100  6142571
eventpoll
java10508 root  398u  FIFO0,6   6135319
  pipe
java10508 root  399w  FIFO0,6   6135319
  pipe
   
I'm using FSDataInputStream and FSDataOutputStream, so this might be
related
to pipes?
   
So, my questions are:
   
1) What happens these pipes/epolls to appear?
   
2) More important, how I can prevent their accumation and growth?
   
Thanks in advance!
   
2009/6/21 Stas Oskin stas.os...@gmail.com
   
 Hi.

 I have HDFS client and HDFS datanode running on same machine.

 When I'm trying to access a dozen of files at once from the client,
several
 times in a row, I'm starting to receive the following errors on
  client,
and
 HDFS browse function.

 HDFS Client: Could not get block locations. Aborting...
 HDFS browse: Too many open files

 I can increase the maximum number of files that can opened, as I
 have
   it
 set to the default 1024, but would like to first solve the problem,
  as
 larger value just means it would run out of files again later on.

 So my questions are:

 1) Does the HDFS datanode keeps any files opened, even after the
 HDFS
 client have already closed them?

 2) Is it possible to find out, who keeps the opened files -
 datanode
  or
 client (so I could pin-point the source of the problem).

 Thanks in advance!

   
  
  
  
   --
   Pro Hadoop, a book to guide you from beginner to hadoop mastery,
   http://www.amazon.com/dp/1430219424?tag=jewlerymall
   www.prohadoopbook.com a community for Hadoop Professionals
  
 
 
 
  --
  Pro Hadoop, a book to guide you from beginner to hadoop mastery,
  http://www.amazon.com/dp/1430219424?tag=jewlerymall
  www.prohadoopbook.com a community for Hadoop Professionals
 




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: Too many open files error, which gets resolved after some time

2009-06-21 Thread Scott Carey
Furthermore, if for some reason it is required to dispose of any objects after 
others are GC'd, weak references and a weak reference queue will perform 
significantly better in throughput and latency - orders of magnitude better - 
than finalizers.

On 6/21/09 9:32 AM, brian.lev...@nokia.com brian.lev...@nokia.com wrote:

IMHO, you should never rely on finalizers to release scarce resources since you 
don't know when the finalizer will get called, if ever.

-brian



-Original Message-
From: ext jason hadoop [mailto:jason.had...@gmail.com]
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: Too many open files error, which gets resolved after some time

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 After tracing some more with the lsof utility, and I managed to stop the
 growth on the DataNode process, but still have issues with my DFS client.

 It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
 a small part of the lsof output:

 java10508 root  387w  FIFO0,6   6142565 pipe
 java10508 root  388r  FIFO0,6   6142565 pipe
 java10508 root  389u     0,100  6142566
 eventpoll
 java10508 root  390u  FIFO0,6   6135311 pipe
 java10508 root  391r  FIFO0,6   6135311 pipe
 java10508 root  392u     0,100  6135312
 eventpoll
 java10508 root  393r  FIFO0,6   6148234 pipe
 java10508 root  394w  FIFO0,6   6142570 pipe
 java10508 root  395r  FIFO0,6   6135857 pipe
 java10508 root  396r  FIFO0,6   6142570 pipe
 java10508 root  397r     0,100  6142571
 eventpoll
 java10508 root  398u  FIFO0,6   6135319 pipe
 java10508 root  399w  FIFO0,6   6135319 pipe

 I'm using FSDataInputStream and FSDataOutputStream, so this might be
 related
 to pipes?

 So, my questions are:

 1) What happens these pipes/epolls to appear?

 2) More important, how I can prevent their accumation and growth?

 Thanks in advance!

 2009/6/21 Stas Oskin stas.os...@gmail.com

  Hi.
 
  I have HDFS client and HDFS datanode running on same machine.
 
  When I'm trying to access a dozen of files at once from the client,
 several
  times in a row, I'm starting to receive the following errors on client,
 and
  HDFS browse function.
 
  HDFS Client: Could not get block locations. Aborting...
  HDFS browse: Too many open files
 
  I can increase the maximum number of files that can opened, as I have it
  set to the default 1024, but would like to first solve the problem, as
  larger value just means it would run out of files again later on.
 
  So my questions are:
 
  1) Does the HDFS datanode keeps any files opened, even after the HDFS
  client have already closed them?
 
  2) Is it possible to find out, who keeps the opened files - datanode or
  client (so I could pin-point the source of the problem).
 
  Thanks in advance!
 




--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals



Re: Announcing CloudBase-1.3.1 release

2009-06-21 Thread imcaptor
When you use cloudbase, you can create different table for different
daily files.

For example, your directory will like this.

logs
  /200905
 /20090501.log.gz
 /20090502.log.gz
  /200906
 /20090601.log.gz
 /20090602.log.gz

You will create 7 tables, four for daily, two for month, one for all.

2009/6/20 Edisonxp ediso...@gmail.com:
 I've used cloudbase for several months,since version 1.1 .it's a good
 thing,your team did a great job.
 These days, I'm annoying with a problem:with the increasing of data,the my
 daily queries become more and more slow.i hope that cloudbase should have a
 'partition' feature with the creation of table.I used to try 'index', but it
 seems nothing changed.

 发自我的 iPhone

 在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到:

 CloudBase is a data warehouse system for Terabyte  Petabyte scale
 analytics. It is built on top of hadoop. It allows you
 to query flat files using ANSI SQL.


 We have released 1.3.1 version of CloudBase on sourceforge-
 https://sourceforge.net/projects/cloudbase

 Please give it a try and send us your feedback.

 You can follow CloudBase related discussion in the google mail list:

 cloudbase-us...@googlegroups.com



 Release notes -

 New Features:
 * CREATE CSV tables - One can create tables on top of data in CSV
 (Comma Separated Values) format and query them using SQL. Current
 implementation doesn't accept CSV records which span multiple lines.
 Data may not be processed correctly if a field contains embedded line-
 breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc
 for detailed specification of the CSV format.

 Bug fixes:
 * Aggregate function 'AVG' returns the same value as 'SUM' function
 * If a query has multiple aliases, only the last alias works
 --~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups
 CloudBase group.
 To post to this group, send email to cloudbase-us...@googlegroups.com
 To unsubscribe from this group, send email to
 cloudbase-users+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/cloudbase-users?hl=en
 -~--~~~~--~~--~--~---



problem about put a lot of files

2009-06-21 Thread stchu
Hi,
Is there any restriction on the amount of putting files? I tried to
put/copyFromLocal about 50,573 files to HDFS, but I faced a problem:

09/06/22 11:34:34 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block
blk_8245450203753506945_65955
09/06/22 11:34:40 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block
blk_-8257846965500649510_65956
09/06/22 11:34:46 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block
blk_4751737303082929912_65956
09/06/22 11:34:56 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block
blk_5912850890372596972_66040
09/06/22 11:35:02 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010
09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block
blk_6609198685444611538_66040
09/06/22 11:35:08 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010
09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block
blk_6696101244177965180_66040
09/06/22 11:35:17 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block
blk_-5430033105510098342_66105
09/06/22 11:35:26 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block
blk_5325140471333041601_66165
09/06/22 11:35:32 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block
blk_1121864992752821949_66165
09/06/22 11:35:39 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block
blk_-2096783021040778965_66184
09/06/22 11:35:45 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block
blk_6949821898790162970_66184
09/06/22 11:35:51 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block
blk_4708848202696905125_66184
09/06/22 11:35:57 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block
blk_8031882012801762201_66184
09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2359)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)

09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block
blk_8031882012801762201_66184 bad datanode[2]
put: Could not get block locations. Aborting...
Exception closing file /osmFiles/a/109103.gpx.txt
java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)

===

And I checked the log file on one of the datanode:

===
2009-06-22 11:34:47,888 INFO org.apache.hadoop.dfs.DataNode: PacketResponder
2 for block blk_1759242372147720864_66183 terminating
2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: Receiving block
blk_-2096783021040778965_66184 src: /140.96.89.224:53984 dest: /
140.96.89.224:51010
2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_-2096783021040778965_66184 received exception 

problem about put a lot of files

2009-06-21 Thread stchu
Hi,
Is there any restriction on the amount of putting files? I tried to
put/copyFromLocal about 50,573 files to HDFS, but I faced a problem:

09/06/22 11:34:34 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block
blk_8245450203753506945_65955
09/06/22 11:34:40 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block
blk_-8257846965500649510_65956
09/06/22 11:34:46 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block
blk_4751737303082929912_65956
09/06/22 11:34:56 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block
blk_5912850890372596972_66040
09/06/22 11:35:02 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010
09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block
blk_6609198685444611538_66040
09/06/22 11:35:08 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010
09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block
blk_6696101244177965180_66040
09/06/22 11:35:17 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block
blk_-5430033105510098342_66105
09/06/22 11:35:26 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block
blk_5325140471333041601_66165
09/06/22 11:35:32 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block
blk_1121864992752821949_66165
09/06/22 11:35:39 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block
blk_-2096783021040778965_66184
09/06/22 11:35:45 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block
blk_6949821898790162970_66184
09/06/22 11:35:51 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block
blk_4708848202696905125_66184
09/06/22 11:35:57 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010
09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block
blk_8031882012801762201_66184
09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2359)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)

09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block
blk_8031882012801762201_66184 bad datanode[2]
put: Could not get block locations. Aborting...
Exception closing file /osmFiles/a/109103.gpx.txt
java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)

===

And I checked the log file on one of the datanode:

===
2009-06-22 11:34:47,888 INFO org.apache.hadoop.dfs.DataNode: PacketResponder
2 for block blk_1759242372147720864_66183 terminating
2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: Receiving block
blk_-2096783021040778965_66184 src: /140.96.89.224:53984 dest: /
140.96.89.224:51010
2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_-2096783021040778965_66184 received exception 

RE: problem about put a lot of files

2009-06-21 Thread zhuweimin
Hi

The max open files have limit in LINUX box. Please using ulimit to view and
modify the limit
1.view limit
   # ulimit -a
2.modify limit
   For example
   # ulimit -n 10240 

Best wish

 -Original Message-
 From: stchu [mailto:stchu.cl...@gmail.com]
 Sent: Monday, June 22, 2009 12:57 PM
 To: core-user@hadoop.apache.org
 Subject: problem about put a lot of files
 
 Hi,
 Is there any restriction on the amount of putting files? I tried to
 put/copyFromLocal about 50,573 files to HDFS, but I faced a problem:
 ==
 ==
 09/06/22 11:34:34 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block
 blk_8245450203753506945_65955
 09/06/22 11:34:40 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block
 blk_-8257846965500649510_65956
 09/06/22 11:34:46 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block
 blk_4751737303082929912_65956
 09/06/22 11:34:56 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block
 blk_5912850890372596972_66040
 09/06/22 11:35:02 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.193:51010
 09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block
 blk_6609198685444611538_66040
 09/06/22 11:35:08 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.193:51010
 09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block
 blk_6696101244177965180_66040
 09/06/22 11:35:17 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block
 blk_-5430033105510098342_66105
 09/06/22 11:35:26 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010
 09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block
 blk_5325140471333041601_66165
 09/06/22 11:35:32 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.205:51010
 09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block
 blk_1121864992752821949_66165
 09/06/22 11:35:39 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.205:51010
 09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block
 blk_-2096783021040778965_66184
 09/06/22 11:35:45 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.205:51010
 09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block
 blk_6949821898790162970_66184
 09/06/22 11:35:51 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.205:51010
 09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block
 blk_4708848202696905125_66184
 09/06/22 11:35:57 INFO dfs.DFSClient: Exception in
 createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink
 140.96.89.205:51010
 09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block
 blk_8031882012801762201_66184
 09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception:
 java.io.IOException: Unable to create new block.
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(
 DFSClient.java:2359)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.
 java:1745)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl
 ient.java:1922)
 
 09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block
 blk_8031882012801762201_66184 bad datanode[2]
 put: Could not get block locations. Aborting...
 Exception closing file /osmFiles/a/109103.gpx.txt
 java.io.IOException: Could not get block locations. Aborting...
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(D
 FSClient.java:2153)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.
 java:1745)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl
 ient.java:1899)
 
 ==
 =
 
 And I checked the log file on one of the datanode: